K

Homepage
Install K
Pyk Documentation
K Tutorial
- Section 1: Basic K Concepts
- Section 2: Intermediate K Concepts
K User Manual
K Cheat Sheet
K Tool Reference
K Builtins
- domains
- kast
- prelude
- ffi
- json
- rat
- substitution
- unification
K PL Tutorial
Projects using K

K is a rewrite-based
executable semantic framework in which programming languages, type
systems and formal analysis tools can be defined using configurations
and rules. Configurations organize the state in units called cells,
which are labeled and can be nested. K rewrite rules make it explicit
which parts of the term are read-only, write-only, read-write, or
unused. This makes K suitable for defining truly concurrent languages
even in the presence of sharing. Computations are represented as
syntactic extensions of the original language abstract syntax, using a
nested list structure which sequentializes computational tasks, such
as program fragments. Computations are like any other terms in a
rewriting environment: they can be matched, moved from one place to
another, modified, or deleted. This makes K suitable for defining
control-intensive features such as abrupt termination, exceptions, or
call/cc.

K Tool Download

Install from the latest K GitHub Release.
Install pyk, K's scripting interface for Python. Check the API documentation for a complete reference of supported features.
Try our Editor Support page for links to K syntax highlighting definitions for various popular editors/IDEs. Please feel free to contribute.
Build or browse the code on GitHub, where you can also report bugs.

Learn K

Do the K Tutorial!
Reference Documentation
K Cheat Sheet
K Tool Reference
K Builtins

Support

Discord Server: Most direct way to get support.
Matrix Room: Another way to get support.
Telegram: for newsletters and general announcements.

Resources

K Approach and Vision (2020): slide presentation
A set of reference implementations and tutorials for common programming language features and paradigms is available, although parts of these implementations may not be fully up to date with modern K features.
Read some papers about K on the Formal Systems Laboratory (FSL).
Matching logic webpage at UIUC (USA).
A ten-minute overview video slide presentation.
A ninety-minute tutorial video, given at ETAPS'16.
[Optional] A high-level interview about rewrite-based semantics (Wolfram Schulte interviews Grigore Rosu at ICSE'11.
FAQ

K Tutorial

K is a framework which allows you to define a language formally and extract
a set of tools (e.g., interpreter, parser, symbolic verifier) for it
automatically.

In this series of lessons you will learn how to program in K.
Since the tutorial is addressed mainly to developers,
we assume you have a firm grounding in computer science broadly, as
well as experience writing code in functional programming languages.

For a more detailed tutorial explaining the basic principles of programming
language design, refer to
K PL Tutorial.
Keep in mind that it might be out of date.

To start the K tutorial, begin with
Section 1: Basic Programming in K.

Section 1: Basic K Concepts

In this first section you'll learn the basic principles of K.
We assume you've got no prior experience with K as programming language, so
we teach you everything.

By the end of this section, you'll be able to define a simple language in K
and use its specifications to generate a fast interpreter for it.
You'll also know how to write basic deductive program verification proofs over
programs writen in your language.
You'll be all set to dwelve into more advanced stuff coming up in
the next section Section 2: Intermediate K Concepts.

First things first, you need to set up K environment on your machine.
Lesson 1.1: Setting up K Environment.

Lesson 1.1: Setting up a K Environment

The first step to learning K is to install K on your system, and configure your
editor for K development.

Installing K

You have two options for how to install K, depending on how you intend to
interact with the K codebase. If you are solely a user of K, and have no
interest in developing or making changes to K, you most likely will want to
install one of our binary releases of K. However, if you are going to be a K
developer, or simply want to build K from source, you should follow the
instructions for a source build of K.

Installing K from a binary release

K is developed as a rolling release, with each change to K that passes our
CI infrastructure being deployed on GitHub for download. The latest release of
K can be downloaded here.
This page also contains information on how to install K. It is recommended
that you fully uninstall the old version of K prior to installing the new one,
as K does not maintain entries in package manager databases, with the exception
of Homebrew on MacOS.

Installing K from source

You can clone K from GitHub with the following Git command:

git clone https://github.com/runtimeverification/k --recursive

Instructions on how to build K from source can be found
here.

Configuring your editor

K maintains a set of scripts for a variety of text editors, including vim and
emacs, in various states of maintenance. You can download these scripts with
the following Git command:

git clone https://github.com/kframework/k-editor-support

Because K allows users to define their own grammars for parsing K itself,
not all features of K can be effectively highlighted. However, at the cost of
occasionally highlighting things incorrectly, you can get some pretty good
results in many cases. With that being said, some of the editor scripts in the
above repository are pretty out of date. If you manage to improve them, we
welcome pull requests into the repository.

Troubleshooting

If you have problems installing K, we encourage you to reach out to us. If you
follow the above install instructions and run into a problem, you can
Create a bug report on GitHub

Next lesson

Once you have set up K on your system to your satisfaction, you can continue to
Lesson 1.2: Basics of Functional K.

Lesson 1.2: Basics of Functional K

In this lesson you will learn about basic K syntactic constructs, how
to write simple K definitions, and how to compile and run them.

Your first K program

Now that you've installed K on your system, you're ready for your first
program in K.

Copy the code below into your editor and save it as lesson-02-a.k.
K files end with the .k extension.

module LESSON-02-A

  syntax Color ::= Yellow() | Blue()
  syntax Fruit ::= Banana() | Blueberry()
  syntax Color ::= colorOf(Fruit) [function]

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

Note that the name of the module and name of the file match. While the name of
the module has to be written in capitals, the file name can have any
capitalization. It is not a requirement that the file carries the module
name, but merely a convention. If you choose to name your file differenly, you will
need to compile it with attribute --main-module followed by the name of
the module, to point the compiler to the main module of the definition. More
about naming conventions and main modules will follow in
Lesson 1.5.

Now, save the file and, from within the directory where you saved it, run:

kompile lesson-02-a.k

kompile is K's compiler. It takes a K definition and compiles it into an
interpreter. Right now, you are compiling a single-file K definition, but
we'll cover multiple-file K definitions later on.

You should get the following output:

[Warning] Compiler: Could not find main syntax module with name
LESSON-02-A-SYNTAX in definition.  Use --syntax-module to specify one. Using
LESSON-02-A as default.

It's a warning highlighting a K convention of splitting the syntactic
definitions from the actual implementation into different modules. In
Lesson 1.5 you will learn how to do it the proper
way, as the examples will get bigger. For the smaller examples we work on
now, we can just ignore it.

kompile will also output a directory containing everything needed to execute
programs and perform proofs using the K definition given as input.
Check your working directory and you'll see directory lesson-02-a-kompiled
has beed created inside. If you're curious, have a look at its contents and
inspect the generated documentation.

This directory not only contains all the basic tools produced by a K
definition, e.g., parser, interpreter, or verifier, but also all sorts of
files that can be used as input to execute these tools. More about K's parser
will come in the next lesson, more on the other tools in due time.

Now, save the program below in the file banana.color in the same directory as
lesson-02-a.k:

colorOf(Banana())

Note that we use color as file extension, but merely because this is the
sort returned by function colorOf. The file name gives us a better
understanding of what the file contains.

We can now evaluate this K term by running (from the same directory):

krun banana.color

krun is a tool which uses the interpreter generated by kompile to execute
this program.

You should get the following output:

<k>
  Yellow ( ) ~> .K
</k>

For now, don't worry about the <k>, </k>, or ~> .K portions of this
output file. We'll come back to them later.

You can also execute small programs directly on the command line instead of
putting them in a file. For example, the same program above
could have been executed by running the following command:

krun -cPGM='colorOf(Banana())'

Or by running the command below:

krun -cPGM='colorOf(Banana())' --definition 'lesson-02-a-kompiled'

-cPGM='colorOf(Banana())' simply sets variable PGM responsible for holding
the program to execute to value colorOf(Banana()).
Attribute --definition points to the directory containing the compiled version
of LESSON-02-A.

Exercise

Use krun to compute the return value of the colorOf function on a
Blueberry().

Structure of a K file

A K file consists of one or more modules, of which only one is the
main one, in the sense that no other module defined in that file imports
it. The name of this main module will also be the one given as argument to
attribute main-module at compile time.
Modules residing in other K files are imported instead through requires
statements (similar to include in C/C++). You'll learn more about imports and
requires in Lesson 1.5.
In this lesson we continue with a closer look to a K module structure.

A K module is formed of sentences (and imports), and sentences
come in different forms. For example, productions and rules are two
types of sentences you have seen already in example lesson-02-a.k.
We'll discuss each separately.

Productions, constructors, and functions

Productions are introduced with the syntax keyword and our first K module
LESSON-02-A contains 5 productions defining sorts Color and Fruit.
You can think of them as enums in C or data constructs in Haskell, although
you'll find that they encompass other behaviors too.

Most of productions above are constructor productions (e.g., Yellow(),
Blue(), or Blueberry()), while the last one is a function production
(colorOf(Fruit) [function]). It's easy to distinguish the two as the latter
production has additional attribute function.
There are other types of productions—tokens, brackets, lists, macros,
or aliases—but you'll learn about them in due time.
There are other types of attributes too, and a sentence can carry several,
in which case they will be separated by comma between square brackets.
We will discuss them throughout the tutorial.

Constructors can have arguments, but these ones do not. We will cover the
syntax of productions in detail in the next lesson, but for now, you can write
a production with no arguments as an uppercase or lowercase identifier followed
by the () operator. Sorts must always start with uppercase letter.

Returning to the syntax of a K module instead, note that individual productions
of the same sort are separated by pipe operator |.
For example, we can write an equivalent K definition of lesson-02-a.k
as definition lesson-02-b.k as follows:

module LESSON-02-B

  syntax Color ::= Yellow()
  syntax Color ::= Blue()
  syntax Fruit ::= Banana()
  syntax Fruit ::= Blueberry()
  syntax Color ::= colorOf(Fruit) [function]

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

We can be even more compact, as in definition lesson-02-c.k:

module LESSON-02-C

  syntax Color ::= Yellow()
                 | Blue()
                 | colorOf(Fruit) [function]
  syntax Fruit ::= Banana()
                 | Blueberry()

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

You can try compiling and running both lesson-02-b.k and lesson-02-c.k to
see that they produce the same output as lesson-02-a.k.
If you run them from the same directory, remember to run with argument
--definition to specify where the compiled version of the specific K
definition is:

kompile lesson-02-b.k
krun -cPGM='colorOf(Banana())' --definition 'lesson-02-b-kompiled'

and

kompile lesson-02-c.k
krun -cPGM='colorOf(Banana())' --definition 'lesson-02-c-kompiled'

Rules, matching, and variables

We learned that functions are a type of productions carrying attribute
function. However, the function production only introduces a specific
function, it does not define it. For defining the behavior of a function in K,
we use rules.

A rule begins with the rule keyword, which is followed by a left-hand side,
rewrite operator =>, and a right-hand side. The left-hand side
contains the name of the function and zero or more patterns corresponding
to the parameters of the function. The right-hand side is another pattern.
It specifies the function behavior for those parameters. Then, we can read the
rule as follows: if the function is called with arguments that match the
patterns on the left-hand side, then the function will return the value of the
rewritten pattern on the right-hand side.

For example, in the above example, if the argument of the colorOf function
is Banana(), then the return value of the function is Yellow().

Note that a function's definition can be expressed through several rules and that
functions in K can be partial.

Let's add a new fruit constructor Kiwi() in lesson-02-d.k

module LESSON-02-D

  syntax Color ::= Yellow()
                 | Blue()
                 | Kiwi()
                 | colorOf(Fruit) [function]
  syntax Fruit ::= Banana()
                 | Blueberry()

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

and compile:

kompile lesson-02-d.k

You will notice that the new K definition compiled just fine.
Now, execute the following:

krun -cPGM='colorOf(Kiwi())' --definition 'lesson-02-d-kompiled'

and observe the error:

[Error] krun: lesson-02-d-kompiled/interpreter 
/tmp/.krun-2025-04-14-20-41-45-bxhfwDNT0t/tmp.in.LwwPb4Jo9W -1 
/tmp/.krun-2025-04-14-20-41-45-bxhfwDNT0t/result.kore
colorOf ( Kiwi ( ) )

The function is undefined for this value, i.e., there is no rule for function
colorOf whose left-hand side can be pattern-matched to Kiwi(), thus no
right-hand side pattern either, and no output value.

We said that the left- and right-hand sides of a rule are patterns.
Thus, constructors are a type of pattern. We will introduce more complex patterns
later, but before we proceed to the next lesson, let us briefly
discuss one more type of pattern—variables.

A variable, syntactically, consists of an uppercase identifier. It differs from
a constructor in that it matches any pattern with one exception: two
variables with the same name must match the same pattern.

Take the more complex example below (lesson-02-e.k):

module LESSON-02-E

  syntax Container ::= Jar(Fruit)
  syntax Fruit ::= Apple() | Pear()

  syntax Fruit ::= contentsOfJar(Container) [function]

  rule contentsOfJar(Jar(F)) => F

endmodule

Note that Jar is a constructor with a single argument. Multiple arguments
are separated by comma.

In this example, F is a variable. It will match either Apple() or Pear().
The return value of the function is created by substituting the matched
values of all of the variables into the variables on the right-hand side of
the rule.

To demonstrate, compile this definition and execute the following program with
krun:

kompile lesson-02-e.k 
krun -cPGM='contentsOfJar(Jar(Apple()))' --definition 'lesson-02-e-kompiled'

The program returns Apple(), because that is the pattern that was matched by
F.

Exercise

Extend the definition of function colorOf in lesson-02-d.k to return a
pattern for Kiwi().

Exercises

Extend the definition in lesson-02-d.k with the addition of blackberries.
For simplicity, consider blackberries to be black and kiwis to be green. Then
compile your definition and test that your additional fruits are correctly
handled by the colorOf function.
Create a new definition which defines an outfit as a multi-argument constructor
consisting of a hat, shirt, pants, and shoes. Define a new sort
Boolean with two constructors, true and false. Productions for hat, shirt, pants,
and shoes will have a single argument each—a color, either black or
white. Then define an outfitMatching function that will return true if all
the pieces of the outfit are the same color. You do not need to define the
case that returns false. Execute your program on different values to see that
your function behaves the way you expect.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.3: BNF Syntax and Parser Generation.

Lesson 1.3: BNF Syntax and Parser Generation

In this lesson we will introduce more key aspects of the syntax and
semantics of productions in K, and show how these, along with other
syntactic sentences can be used to define grammars for parsing both rules
and programs. In this context, you'll also learn about two additional types
of productions, brakets and tokens.

K's approach to parsing

K's grammar is divided into two components: the outer syntax of K and the
inner syntax of K. Outer syntax refers to the parsing of requires,
modules, imports, and sentences in a K definition. Inner syntax
refers to the parsing of rules and programs. Unlike the outer syntax of
K, which is predetermined, much of the inner syntax of K is defined by you, the
developer. When rules or programs are parsed, they are parsed within the
context of a module. Rules are parsed in the context of the module in which
they exist, whereas programs are parsed in the context of the
main syntax module of a K definition.

Recall that a K definition consists of several modules, which in turn consist
each of several sentences (productions, rules, etc.). Sentences within a
module form the grammar of that module, and this grammar is used for parsing
programs in the language you defined.

Basic BNF productions

To illustrate how this works, let's consider the K module below which defines
a calculator for evaluating Boolean expressions containing operations AND, OR,
NOT, and XOR.

Save the code below in file lesson-03-a.k:

module LESSON-03-A

  syntax Boolean ::= "true" | "false"
                   | "!" Boolean [function]
                   | Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

Observe that the productions in this module look a little different than
what we have seen in the previous lesson. The reason is that K has two
mechanisms for defining productions. A more generic one using a variant
of BNF notation
and its special case we have seen in Lesson 1.2.
There the ::= symbol was followed by an alphanumeric identifier and a
(possibly empty) comma-separated list of sorts in parentheses. In this
lesson, we focus on the former.

Recall the set of productions from the previous lesson:

module LESSON-03-B

  syntax Color ::= Yellow() | Blue()
  syntax Fruit ::= Banana() | Blueberry()
  syntax Color ::= colorOf(Fruit) [function]

endmodule

We can write an equivalent definition in BNF notation as follows:

module LESSON-03-C

  syntax Color ::= "Yellow" "(" ")" | "Blue" "(" ")"
  syntax Fruit ::= "Banana" "(" ")" | "Blueberrry" "(" ")"
  syntax Color ::= "colorOf" "(" Fruit ")" [function]

endmodule

Note that sort Fruit of the function's argument is unchanged, but
everything else has been wrapped in double quotation marks. This is because
in BNF notation, we distinguish between two types of production items:
terminals and non-terminals. A terminal denotes a fixed sequence of
characters that is a verbatim part of the syntax of that production. For
example, Banana, (, ), or colorOf are such sequences of characters and
all considered terminals. Conversely, non-terminals, refer to a sort name,
like Fruit, and the syntax of the production they belong to accepts any valid
term of that sort at that position.

In the previous lesson we executed successfully the program colorOf(Banana())
using krun. That is because the program represented a term of sort Color:
indeed, Banana() is a term of sort Fruit, hence a valid argument for
function colorOf. krun parses and interprets terms according to the grammar
you define. Under the hood, the term is automatically converted into an AST
(abstract syntax tree), and then the function colorOf is evaluated using the
function rules provided in the definition.

How does K match the strings between the double quotes? The answer is that K
uses Flex to generate a
scanner for the grammar. Remember that a scanner, or lexical analyzer or lexer,
is a component of an interpreter that breaks down source code into tokens,
which are units such as keywords, variables, and operators. These tokens are
then processed by the parser, which interprets the structure of the code
according to the grammar rules. Flex looks for the longest possible match of a
regular expression in the input. If there are ambiguities between two or more
regular expressions, it will pick the one with the highest prec attribute.
You can learn more about how Flex matching works in the
Flex Manual | Matching.

Returning to module LESSON-03-A, we can see that it defines a simple BNF
grammar for expressions over Booleans. We have defined constructors
corresponding to the Boolean values true and false, and functions
corresponding to the Boolean operators AND, OR, NOT, and XOR. We have also
given a syntax for each of these functions based on their syntax in the C
programming language. As such, we can now write programs in the simple language
we have defined!

Save the code below in file and.bool:

true && false

Now, let's compile our grammar first:

kompile lesson-03-a.k

Recall that compilation produces a parser, interpreter, and verifier for the
grammar specified in the K definition. Interpreting the program by executing

krun and.bool

will raise an error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  No tag found for symbol Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}. Maybe attempted to evaluate a symbol with no rules?

/nix/store/2av44963lcqsmkj7hwmjhrg9pzpbkr1i-k-7.1.232-88c9c766d76f624329400cd554fafd3beec16a15/bin-unwrapped/../lib/kframework/k-util.sh: line 114: 384286 Aborted                 (core dumped) "$@"
[Error] krun: ./lesson-03-a-kompiled/interpreter 
/tmp/.krun-2025-04-29-12-58-55-lb8wwXcWVv/tmp.in.0nVZCuXTXv -1 
/tmp/.krun-2025-04-29-12-58-55-lb8wwXcWVv/result.kore
Syntax error at /tmp/.krun-2025-04-29-12-58-55-lb8wwXcWVv/result.kore:1.1: Expected: [<id>, <string>] Actual: <EOF>
[Error] krun: kore-print --definition ./lesson-03-a-kompiled --output pretty 
/tmp/.krun-2025-04-29-12-58-55-lb8wwXcWVv/result.kore --color on

This is expected, as we have not given rules defining the meaning of the &&
function, and the error message highlights exactly this—Maybe attempted
to evaluate a symbol with no rules?

While we cannot interpret the program just yet, we can parse it. To do this,
run the command below from the same directory:

kast --output kore and.bool

You should see the following AST printed on standard output, minus the
formatting:

inj{SortBoolean{}, SortKItem{}}(
  Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
    Lbltrue'Unds'LESSON-03-A'Unds'Boolean{}(),
    Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
  )
)

kast is K's just-in-time parser, just another tool generated at compile time.
It produces a grammar from the K definition on the fly and uses it to parse
the program passed on the command line.

K allows for several AST representations and you can choose a specific one by
setting the --output flag. You can see all possible value options by running
kast --help. kore used above is one of them and denotes KORE, the
intermediate representation of K. You can learn more about KORE in another
tutorial,
currently work-in-progress.
Value kast for the flag gives us an AST in a more direct representation of
the original K definition.

Executing

kast --output kast and.bool

yields the following output, minus the formatting:

`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(
  `true_LESSON-03-A_Boolean`(.KList),
  `false_LESSON-03-A_Boolean`(.KList)
)

Comparing both outputs, you can observe that the former is largely a
name-mangled version of the latter. A notable difference is the presence of the
inj symbol in the KORE output and you can learn more about it in the
KORE tutorial.

Note that kast also takes expressions as arguments, not only file names,
but not both at the same time. If you want to parse an expression, you need to
use flag -e or --expression:

kast --output kast -e "true && false"

Exercise

Parse the expression false || true with --output kast. See if you can
predict approximately what the corresponding output would be with
--output kore, then run the command and compare it to your prediction.

Ambiguities

Now let's try a slightly more advanced example. Save the following program as
and-or.bool:

true && false || false

If you try to parse it, you will see the following error:

[Error] Inner Parser: Parsing ambiguity.
1: syntax Boolean ::= Boolean "&&" Boolean [function]
    `_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`_||__LESSON-03-A_Boolean_Boolean_Boolean`(`false_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)))
2: syntax Boolean ::= Boolean "||" Boolean [function]
    `_||__LESSON-03-A_Boolean_Boolean_Boolean`(`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)),`false_LESSON-03-A_Boolean`(.KList))
	Source(./and-or.bool)
	Location(1,1,1,23)
	1 |	true && false || false
	  .	^~~~~~~~~~~~~~~~~~~~~~

The error is saying that kast was unable to parse the program because it is
ambiguous. K's just-in-time parser is a GLL (generalized
left-to-right, leftmost derivation) parser, which means it can handle
the full generality of context-free grammars, including those grammars which
are ambiguous. An ambiguous grammar is one where the same string can be parsed
as multiple distinct ASTs. In this example, it can't decide whether it should
be parsed as (true && false) || false (Fig. 3-A) or as true && (false || false)
(Fig. 3-B).

Fig. 3-A

         ||
       /    \
     &&    false
   /    \
true   false

Fig. 3-B

    &&
  /    \
true    ||              
      /    \
   false  false

In Boolean logic and other programming languages such as C, logical AND has
precedence over logical OR, rendering the AST in Fig. 3-A the only valid one.
However, grammars defined in K assume all operators to have the same priority
in evaluation, unless specified otherwise. Both ASTs in Fig. 3-A and Fig. 3-B
are possible with the grammar we defined in module LESSON-3-A, hence the
ambiguity reported by the parser. You will learn in the next lesson how to set
up precendence of some operators over others and define the logical connectives
the usual way. We continue this lesson by showing how to reduce ambiguity
through the use of brackets.

Brackets

With the grammar defined in module LESSON-03-A there is no way of resolving
this ambiguity, making it impossible to write complex expressions in our small
language. The standard solution in most programming languages to this problem
is to use parentheses to indicate the appropriate grouping. K generalizes this
notion into a type of production called bracket.

A bracket production is any production with the bracket attribute. It is
required that such a production only have a single non-terminal, and the sort
of the production must equal the sort of that non-terminal. K does not
otherwise impose any restrictions on the grammar provided for a bracket.

Like in other languages, the most common type of bracket is one in which a
non-terminal is surrounded by terminals representing one of the following
symbols (), [], {}, or <>. For example, we can define the most common
type of bracket, the parentheses, quite simply. Consider the following modified
definition and save it to file lesson-03-d.k:

module LESSON-03-D

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   | "!" Boolean [function]
                   | Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

With this augmented definition, you are now able to parse more complex programs
by explicitly grouping subterms with the bracket we have just defined.

Consider and-or-left.bool:

(true && false) || false

and and-or-right.bool:

true && (false || false)

When parsing these programs with kast, you get a unique AST with no error.
If you check the output carefully, you will notice that the bracket itself does
not appear in the AST. In fact, this is a property unique to bracket
productions: they are not represented in the parsed AST of a term, and the
child of the bracket is folded immediately into the parent term. This is why we
have the requirement mentioned above, that a bracket production must have a
single non-terminal of the same sort as the production itself.

Exercise

Write out the AST you expect to be arising from parsing the two programs above
with --output kast, then parse them and compare the result to the AST you
expected. Confirm for yourself that the bracket production does not appear in
the AST.

Tokens

So far we have seen how to define the grammar of a language and we have
implicitly been using K's automatic lexer generation to produce a token for
each terminal in our grammar. However, the grammar is not the only relevant
part of parsing a language. Also relevant is the lexical syntax of the
language, i.e., how the tokens are defined and recognized.

Sometimes we need to define more complex lexical syntax. Consider, for
instance, integers in C. They consist of a decimal, octal, or hexadecimal
number, followed by an optional suffix that specifies the type of the literal.
While it’s theoretically possible to define this syntax using a grammar, doing
so would be cumbersome and tedious. Additionally, you'd be faced with an AST
generated for the literal, which is not particularly convenient to work with.
As an alternative, K allows you to define token productions, which consist
of a regular expressions
followed by the token attribute. The resulting AST would then consist of a
typed string containing the value recognized by the regular expression.

For example, the built-in integers in K are defined using the following
production:

syntax Int ::= r"[\\+\\-]?[0-9]+" [token]

An integer is thus an optional sign followed by a nonzero sequence of digits.
The r preceding the terminal indicates that what appears inside the double
quotes is a regular expression, and the token attribute indicates that terms
which parse as this production should be converted into a token.

Before looking at how integers in C can be defined in K, let us mention that it
is also possible to define tokens that do not use regular expressions. This can
be useful when you wish to declare particular identifiers for use in your
semantics later. For example:

syntax Id ::= "main" [token]

Here, we declare main as a token of sort Id. Instead of being parsed as a
symbol, it gets parsed as a token, generating a typed string in the AST.
This can be useful in a semantics of C because the parser typically doesn't
handle the main function in any special way; it's only the semantics that
gives it special treatment.

The syntax of integers in C has a more complex lexical structure than the one
of built-in integers in K, and a production defining them could look as
follows:

syntax IntConstant ::= r"(([1-9][0-9]*)|(0[0-7]*)|(0[xX][0-9a-fA-F]+))(([uU][lL]?)|([uU]((ll)|(LL)))|([lL][uU]?)|(((ll)|(LL))[uU]?))?" [token]

This is a long and complex regular expression, hard to read. In addition,
unlike a grammar, it is not particularly modular. However, we can get around
this restriction by declaring explicit regular expressions, giving them a
name, and referring to them in productions.

Consider the following (equivalent) way to define the lexical syntax of
integers in C:

syntax IntConstant ::= r"({DecConstant}|{OctConstant}|{HexConstant})({IntSuffix}?)" [token]
syntax lexical DecConstant = r"{NonzeroDigit}({Digit}*)"
syntax lexical OctConstant = r"0({OctDigit}*)"
syntax lexical HexConstant = r"{HexPrefix}({HexDigit}+)"
syntax lexical HexPrefix = r"0x|0X"
syntax lexical NonzeroDigit = r"[1-9]"
syntax lexical Digit = r"[0-9]"
syntax lexical OctDigit = r"[0-7]"
syntax lexical HexDigit = r"[0-9a-fA-F]"
syntax lexical IntSuffix = r"{UnsignedSuffix}({LongSuffix}?)|{UnsignedSuffix}{LongLongSuffix}|{LongSuffix}({UnsignedSuffix}?)|{LongLongSuffix}({UnsignedSuffix}?)"
syntax lexical UnsignedSuffix = r"[uU]"
syntax lexical LongSuffix = r"[lL]"
syntax lexical LongLongSuffix = r"ll|LL"

As you can see, this is rather more verbose, but it has the benefit of being
easier to read and understand, as well as providing increased modularity.

Note that we refer to a named regular expression by putting the name in curly
brackets. Note also that only the first sentence actually declares a new piece
of syntax in the language. syntax lexical only declares an explicit regular
expression.

Finally, recall that K uses Flex to implement
its lexical analysis. As such, you can refer to the
Flex Manual | Patterns
for a detailed description of the regular expression syntax supported. For
performance reasons, Flex's regular expressions are actually a regular
language, and thus lack some of the syntactic convenience of modern "regular
expression" libraries. If you need features that are not part of the syntax of
Flex regular expressions, you are encouraged to express them via a grammar
instead.

Ahead-of-time parser generation

So far we have been entirely focused on K's support for just-in-time parsing,
where the parser is generated on the fly prior to being used. This method
offers faster parser generation, but its performance suffers if you have to
repeatedly parse strings with the same parser. For this reason, when parsing
programs, it is generally recommended to use K's ahead-of-time parser
generation based on GNU Bison.

You can enable ahead-of-time parsing via the --gen-bison-parser flag to
kompile. This will make use of Bison's
LR(1) parser generator. As
such, if your grammar is not LR(1), it may not parse exactly the same as if
you were to use the just-in-time parser because Bison will automatically pick
one of the possible branches whenever it encounters a shift-reduce or
reduce-reduce conflict. In this case, you can either modify your grammar to be
LR(1), or you can use Bison's GLR support by passing flag
--gen-glr-bison-parser to kompile instead. Note that if your grammar is ambiguous,
the ahead-of-time parser will not provide you with particularly readable error
messages at this time.

kompile --gen-bison-parser 'lesson-03-a.k' gives

[Warning] Compiler: Could not find main syntax module with name
LESSON-03-A-SYNTAX in definition.  Use --syntax-module to specify one. Using
LESSON-03-A as default.
[Warning] Inner Parser: Skipping modules [ML-SYNTAX] tagged as not-lr1 which
are not supported by Bison.

We have seen the first warning before, and we discussed it in
Lesson 1.2. The second warning we get is a side
effect of the first one, informing that certain modules—e.g., for
parsing—were excluded from the grammar generation because they were
known to cause Bison to crash or behave incorrectly.

Next, run

lesson-03-a-kompiled/parser_PGM and-or.bool

to see that now you don't get an error when parsing. Even though our grammar
is ambiguous, the LR(1) algorithm generates a single parse tree. The output,
minus formatting, is the following:

inj{SortBoolean{}, SortKItem{}}(
  Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
    Lbltrue'Unds'LESSON-03-A'Unds'Boolean{}(),
    Lbl'UndsPipePipeUndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
      Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}(),
      Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
    )
  )
)

At closer look, you see it is the AST where || has higher priority (Fig. 3-B).

Compare this with the output given when running the same command, but when the
ahead-of-time parser has been enabled with flag --gen-glr-bison-parser:

inj{SortBoolean{}, SortKItem{}}(
  Lblamb{SortBoolean{}}(
    Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
      Lbltrue'Unds'LESSON-03-A'Unds'Boolean{}(),
      Lbl'UndsPipePipeUndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
        Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}(),
        Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
      )
    ),
    Lbl'UndsPipePipeUndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
      Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
        Lbltrue'Unds'LESSON-03-A'Unds'Boolean{}(),
        Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
      ),
      Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
    )
  )
)

In this case, we get both ASTs. Since our grammar is ambiguous, the GLR
algorithm produces the complete parse forest. The ambiguity is indicated in the
above KORE output by node Lblamb and its two children, each a possible AST of
the term true && false || false.

Finally, note that, for a K definition named foo.k, and directory
foo-kompiled created when running kompile, you can invoke the ahead-of-time
parser you generated by executing foo-kompiled/parser_PGM <file> on a file.

Exercises

Compile lesson-03-d.k with ahead-of-time parsing enabled. Then compare
how long it takes to run kast --output kore and-or-left.bool with how long it
takes to run lesson-03-d-kompiled/parser_PGM and-or-left.bool. Confirm for
yourself that both produce the same result, but that the latter is faster.
Define a simple grammar consisting of integers, brackets, addition,
subtraction, multiplication, division, and unary negation. Integers should be
in decimal form and lexically without a sign, whereas negative numbers can be
represented via unary negation. Ensure that you are able to parse some basic
arithmetic expressions using a generated ahead-of-time parser. Do not worry
about disambiguating the grammar or about writing rules to implement the
operations in this definition.
Write a program where the meaning of the arithmetic expression based on
the grammar you defined above is ambiguous, and then write programs that
express each individual intended meaning using brackets.

Next lesson

Once you have completed the exercises above, you can continue to
Lesson 1.4: Disambiguating Parses.

Lesson 1.4: Disambiguating Parses

In this lesson you will learn how to use K's built-in features to transform
an ambiguous grammar into an unambiguous one that expresses the intended AST.
You will learn how to define the precedence and associativity of operators
and how to favor certain parses over others.

Priority blocks

Parsing unambiguous languages is asymptotically faster than parsing ambiguous
languages. That's why in practice, very few formal languages outside the domain
of natural language processing are ambiguous.

Cluttering the code with brackets to remove ambiguity is not an ideal solution.
Instead, programming language designers have developed methods for
disambiguating language expressions by making use of operator precedence and
associativity. It is often possible to remove all ambiguities in a grammar
with these methods, as they instruct the parser to accept some ASTs instead
of others.

In general, grammars can be rewritten to remove unwanted parses. However,
in K, the grammar specification and AST generation are intrinsically linked,
so we discourage this approach. You will still learn how to remove unwanted
parses in K towards the end of this lesson. Now we continue with showing you
how to explicitly express the relative precedence of operators in different
situations in order to resolve grammar ambiguity.

Recall that in C, && binds tighter than ||, i.e., it has higher
precedence, meaning that the expression true && false || false has only one
valid AST: (true && false) || false.

Consider, then, the third iteration on the grammar of Boolean expressions and
save the code below in file lesson-04-a.k:

module LESSON-04-A

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > Boolean "&&" Boolean [function]
                   > Boolean "^" Boolean [function]
                   > Boolean "||" Boolean [function]

endmodule

In this example, some of the pipe symbols | separating productions have been
replaced with >. This serves to describe the priority groups associated
with a block of productions. The first priority group consists of the atoms
of the language: true, false, and the bracket operator. In general, a
priority group starts either at the ::= or > operator and extends until
either the next > operator, or the end of the production block. Thus, we can
see that the second, third, fourth, and fifth priority groups in this grammar
all consist of a single production.

The meaning of these priority groups becomes apparent when parsing programs:
A symbol with a lesser priority, (i.e., one that binds looser), cannot
appear as the direct child of a symbol with a greater priority (i.e.,
one that binds tighter). In this case, the > operator can be seen as a
greater-than operator describing a transitive partial ordering on the
productions in the production block, expressing their relative priority.

To see this more concretely, let's look again at the program
true && false || false with possible ASTs depicted in Figures 3-A and 3-B
in previous lesson. As noted before, this program was ambiguous because the
parser could either choose && to be the child of || or vice versa.
However, because a symbol with lesser priority (i.e., ||) cannot appear as
the direct child of a symbol with greater priority (i.e., &&), the parser
will reject the parse where || is under the && operator (Fig. 3-B).
As a result, we are left with the unambiguous parse
(true && false) || false. Conversely, if the user wants the other parse,
they can express this with brackets by explicitly writing
true && (false || false). This still parses successfully because the ||
operator is no longer the direct child of the && operator, but of the
() operator, even if the bracket is not explicitly depicted in the AST.
Internally, && operator is viewed as an indirect parent, which is not
subject to the priority restriction.

You must have noticed that () has been defined as having greater priority
than ||. However, in example true && (false || false), || appears as
a direct child of (), in what seems to be a contradiction to the priority
rule. What we have not mentioned is that the priority rule is more complex
and applies only conditionally. Specifically, it applies in cases where the
child is either the first or last production item in the parent's
production. For example, in production Boolean "&&" Boolean, the first
Boolean non-terminal is not preceded by any terminals, and the last
Boolean is not followed by any terminals. As a result, we apply the priority
rule to both children of &&. In production "(" Boolean ")", the
non-terminal is both preceded and followed by terminals "(" and ")".
Thus, the priority rule is not applied when () is the parent. Because of
this, program true && (false || false) parses successfully.

Exercise

Parse the program true && false || false using kast, and confirm that the
AST places || as the top-level symbol. Then modify the definition so that you
will get the alternative parse.

Associativity

Sometimes, even after breaking the expression grammar into priority blocks we
still get an ambiguous grammar. Let's try to parse the following program
(assoc.bool):

true && false && false

Priority blocks will not help us here. We have two possible parses with a
direct parent and child which are within a single priority block (in this case,
&& is in the same block as itself):

Fig. 4-A

         &&
       /    \
     &&    false
   /    \
true   false

Fig. 4-B

    &&
  /    \
true    &&              
      /    \
   false  false

This is where the notion of associativity comes into play. Associativity
applies the following additional rules to parses:

a left-associative symbol cannot appear as a direct rightmost child of a
symbol with equal priority;
a right-associative symbol cannot appear as a direct leftmost child of a
symbol with equal priority; and
a non-associative symbol cannot appear as a direct leftmost or rightmost
child of a symbol with equal priority.

In C, binary operators are all left-associative, meaning that the expression
true && false && false parses unambiguously as (true && false) && false.
Since && cannot appear as the rightmost child of itself, only the AST in
Fig. 4-A is valid.

Consider, then, the fourth iteration on the grammar of this definition
(lesson-04-b.k):

module LESSON-04-B

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > left: Boolean "&&" Boolean [function]
                   > left: Boolean "^" Boolean [function]
                   > left: Boolean "||" Boolean [function]

endmodule

Here each priority group, immediately after the ::= or > operator, can
be followed by a literal representing the associativity of that priority group:
either left: for left-associativity, right: for right-associativity, or
non-assoc: for non-associativity. In this example, each priority group we
apply associativity to has only a single production, but we could equally well
write a priority block with multiple productions and one associativity.
For example, consider the grammar below (file lesson-04-c.k):

module LESSON-04-C

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > left:
                     Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

In this example, unlike the one above, &&, ^, and || have the same
priority, as they are part of the same block. Addionally, the entire group is
left-associative. This means that none of &&, ^, and || can appear as the
right child of any of &&, ^, or ||. Hence, this grammar is also not
ambiguous. However, it expresses a different grammar, and you are encouraged
to think about what the differences are in practice.

Exercise

Parse the program true && false && false yourself, and confirm that the AST
places the rightmost && at the top of the expression. Then modify the
definition to generate the alternative parse.

Explicit priority and associativity declarations

Previously we have only considered the case where all the productions which
you wish to express a priority or associativity relation over are co-located
in the same block of productions. However, in practice this is not always
feasible or desirable, especially as a definition grows in size across
multiple modules. As a result, K provides a second way of declaring priority
and associativity relations.

Consider the following grammar, which we name lesson-04-d.k and which
expresses the exact same grammar as lesson-04-b.k:

module LESSON-04-D

  syntax Boolean ::= "true" [group(literal)] | "false" [group(literal)]
                   | "(" Boolean ")" [group(atom), bracket]
                   | "!" Boolean [group(not), function]
                   | Boolean "&&" Boolean [group(and), function]
                   | Boolean "^" Boolean [group(xor), function]
                   | Boolean "||" Boolean [group(or), function]

  syntax priority literal atom > not > and > xor > or
  syntax left and
  syntax left xor
  syntax left or
  
endmodule

This introduces a couple of new features of K. First, the group(_) attribute
of a production is used to conceptually group together sets of sentences under
a common user-defined name. For example, literal in the syntax priority
sentence is used to refer to all productions marked with the group(literal)
attribute, i.e., true and false, atom to all productions marked with
group(atom), i.e., braket production, and so on and so forth. A production
can belong to multiple groups using syntax such as group(myGrp1,myGrp2).

Once we understand this, it becomes relatively straightforward to understand
the meaning of this grammar. Each syntax priority sentence defines a priority
relation where > separates different priority groups. Each priority group is
defined by a list of one or more group names, and consists of all productions
which are members of at least one of those named groups. literal and atom
are only separated by space as they have the same precedence.

In the same way, sentences syntax left, syntax right, or syntax non-assoc
define an associativity relation among left-, right-, or non-associative
groups, respectively. Specifically, this means that:

syntax left a b

is different to:

syntax left a
syntax left b

syntax left a b places a and b in the same associativity block, meaning
that a and/or b cannot be the rightmost child of a and/or b. The
latter sentences instead mean that a cannot be the rightmost child of a and
b cannot be the rightmost child of b, but place no restriction on a
being the rightmost child of b or b being the rightmost child of a.
As a consequence, syntax [left|right|non-assoc] should not be used to
group together labels with different priority.

Prefer/avoid productions

Sometimes priority and associativity prove insufficient to disambiguate a
grammar. In particular, sometimes it is desirable to be able to choose between
two ambiguous parses directly while still not rejecting any parses if the term
parsed is unambiguous. A good example of this is the famous "dangling else"
problem in imperative C-like languages.

Consider the following definition (lesson-04-e.k):

module LESSON-04-E

  syntax Exp ::= "true" | "false"
  syntax Stmt ::= "if" "(" Exp ")" Stmt
                | "if" "(" Exp ")" Stmt "else" Stmt
                | "{" "}"

endmodule

and program dangling-else.if below:

if (true) if (false) {} else {}

This is ambiguous because it is unclear whether the else clause is part of
the outer if or the inner if. At first we might try to resolve this with
priorities, specifying that the if without an else cannot appear as a child
of the if with an else. However, because the non-terminal in the parent
symbol is both preceded and followed by a terminal, this will not work.

Instead, we can resolve the ambiguity directly by telling the parser to
"prefer" or "avoid" certain productions when ambiguities arise. For example,
when we parse this program with

kompile lesson-04-e.k
kast --output kore dangling-else.if

we get the following ambiguity as an error message, minus the formatting:

[Error] Inner Parser: Parsing ambiguity.
1: syntax Stmt ::= "if" "(" Exp ")" Stmt
    `if(_)__LESSON-04-E_Stmt_Exp_Stmt`(
      `true_LESSON-04-E_Exp`(.KList),
      `if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(
        `false_LESSON-04-E_Exp`(.KList),
        `{}_LESSON-04-E_Stmt`(.KList),
        `{}_LESSON-04-E_Stmt`(.KList)
      )
    )

2: syntax Stmt ::= "if" "(" Exp ")" Stmt "else" Stmt
    `if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(
      `true_LESSON-04-E_Exp`(.KList),
      `if(_)__LESSON-04-E_Stmt_Exp_Stmt`(
        `false_LESSON-04-E_Exp`(.KList),
        `{}_LESSON-04-E_Stmt`(.KList)
      ),
      `{}_LESSON-04-E_Stmt`(.KList)
    )

	Source(./dangling-else.if)
	Location(1,1,1,32)
	1 |	if (true) if (false) {} else {}
	  .	^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Roughly, we see that the ambiguity is between an if with an else or an if
without an else. Since we want to pick the first parse, we can tell K to
avoid the second parse with the avoid attribute. Consider the following
modified definition (lesson-04-f.k):

module LESSON-04-F

  syntax Exp ::= "true" | "false"
  syntax Stmt ::= "if" "(" Exp ")" Stmt
                | "if" "(" Exp ")" Stmt "else" Stmt [avoid]
                | "{" "}"

endmodule

Here we have added the avoid attribute to the else production. As a result,
when an ambiguity occurs and any of the possible parses has that attribute at
the top of the ambiguous part of the parse, we discard those parses, and
consider only the remaining ones. The prefer attribute behaves similarly, but
instead it discards all parses which do not have that attribute. In both cases,
no action is taken if the parse is not ambiguous.

If we parse the program in dangling-else.if with this grammar,

kompile lesson-04-f.k
kast --output kast dangling-else.if

we get the following output, minus the formatting:

`if(_)__LESSON-04-F_Stmt_Exp_Stmt`(
	`true_LESSON-04-F_Exp`(.KList),
	`if(_)_else__LESSON-04-F_Stmt_Exp_Stmt_Stmt`(
		`false_LESSON-04-F_Exp`(.KList),
		`{}_LESSON-04-F_Stmt`(.KList),
		`{}_LESSON-04-F_Stmt`(.KList)
	)
)

As we expected, the AST where the else corresponds to the first if is
discarded.

Exercises

Parse the program if (true) if (false) {} else {} using lesson-04-f.k
and confirm that the else clause is part of the innermost if statement.
Then modify the definition so that you will get the alternative parse.
Modify your solution from Lesson 1.3, Exercise 2 so that unary negation should
bind tighter than multiplication and division, which should bind tighter than
addition and subtraction, and each binary operator should be left associative.
Write these priority and associativity declarations explicitly, and then
try to write them inline.
Write a simple grammar containing at least one ambiguity that cannot be
resolved via priority or associativity, and then use the prefer attribute to
resolve that ambiguity.
Explain why the following grammar is not labeled ambiguous by the K parser
when parsing abb, then make the parser realize the ambiguity.

module EXERCISE4

  syntax Expr ::= "a" Expr "b"
                | "abb"
                | "b"

endmodule

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.5: Modules, Imports, and Requires.

Lesson 1.5: Modules, Imports, and Requires

The purpose of this lesson is to explain how K definitions can be broken into
separate modules and files and how these distinct components combine into a
complete K definition.

K's outer syntax

Recall from Lesson 1.3 that K's grammar is broken
into two components: the outer syntax of K and the inner syntax of K.
Outer syntax, as previously mentioned, consists of requires, modules,
imports, and sentences. A K semantics is expressed by the set of
sentences contained in the definition. The scope of what is considered
contained in that definition is determined both by the main semantics
module of a K definition, as well as the requires and imports present
in the file that contains that module.

Basic module syntax

The basic unit of grouping sentences in K is the module. A module consists
of a module name, an optional list of attributes, a list of
imports, and a list of sentences.

A module name consists of one or more groups of letters, numbers, or
underscores, separated by a hyphen. Here are some valid module names: FOO,
FOO-BAR, foo0, foo0_bar-Baz9. Here are some invalid module names: -,
-FOO, BAR-, FOO--BAR. Stylistically, modules names are usually all
uppercase with hyphens separating words, but this is not strictly enforced.

Some example modules include an empty module:

module LESSON-05-A

endmodule

A module with some attributes:

module LESSON-05-B [group(attr1,attr2), private]

endmodule

A module with some sentences:

module LESSON-05-C
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
  rule not true => false
  rule not false => true
endmodule

Imports

Thus far we have only discussed definitions containing a single module.
Definitions can also contain multiple modules, in which one module imports
others.

An import in K appears at the top of a module, prior to any sentences. It can
be specified with the imports keyword, followed by a module name.

For example, here is a simple definition with two modules (lesson-05-d.k):

module LESSON-05-D-1
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
endmodule

module LESSON-05-D
  imports LESSON-05-D-1

  rule not true => false
  rule not false => true
endmodule

This K definition is equivalent to the definition expressed by the single module
LESSON-05-C. Essentially, by importing a module, we include all of the
sentences in the module being imported into the module that we import from.
There are a few minor differences between importing a module and simply
including its sentences in another module directly, but we will cover these
differences later. Essentially, you can think of modules as a way of
conceptually grouping sentences in a larger K definition.

Exercise

Modify lesson-05-d.k to include four modules: one containing the syntax, two
with one rule each that imports the first module, and a final module
LESSON-05-D containing no sentences that imports the second and third module.
Check to make sure the definition still compiles and that you can still evaluate
the not function.

Parsing in the presence of multiple modules

As you may have noticed, each module in a definition can express a distinct set
of syntax. When parsing the sentences in a module, we use the syntax
of that module, enriched with the basic syntax of K, in order to parse
rules in that module. For example, the following definition is a parser error
(lesson-05-e.k):

module LESSON-05-E-1
  rule not true => false
  rule not false => true
endmodule

module LESSON-05-E-2
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
endmodule

This is because the syntax referenced in module LESSON-05-E-1, namely, not,
true, and false, is not imported by that module. You can solve this problem
by simply importing the modules containing the syntax you want to use in your
sentences.

Main syntax and semantics modules

When we are compiling a K definition, we need to know where to start. We
designate two specific entry point modules: the main syntax module
and the main semantics module. The main syntax module, as well as all the
modules it imports recursively, are used to create the parser for programs that
you use to parse programs that you execute with krun. The main semantics
module, as well as all the modules it imports recursively, are used to
determine the rules that can be applied at runtime in order to execute a
program. For example, in the above example, if the main semantics module is
module LESSON-05-D-1, then not is an uninterpreted function (i.e., has no
rules associated with it), and the rules in module LESSON-05-D are not
included.

While you can specify the entry point modules explicitly by passing the
--main-module and --syntax-module flags to kompile, by default, if you
type kompile foo.k, then the main semantics module will be FOO and the
main syntax module will be FOO-SYNTAX.

Splitting a definition into multiple files

So far, while we have discussed ways to break definitions into separate
conceptual components (modules), K also provides a mechanism for combining
multiple files into a single K definition, namely, the requires directive.

In K, the requires keyword has two meanings. The first, the requires
statement, appears at the top of a K file, prior to any module declarations. It
consists of the keyword requires followed by a double-quoted string. The
second meaning of the requires keyword will be covered in a later lesson,
but it is distinguished because the second case occurs only inside modules.

The string passed to the requires statement contains a filename. When you run
kompile on a file, it will look at all of the requires statements in that
file, look up those files on disk, parse them, and then recursively process all
the requires statements in those files. It then combines all the modules in all
of those files together, and uses them collectively as the set of modules to
which imports statements can refer.

Putting it all together

Putting it all together, here is one possible way in which we could break the
definition lesson-02-c.k from Lesson 1.2 into
multiple files and modules:

colors.k:

module COLORS
  syntax Color ::= Yellow()
                 | Blue()
endmodule

fruits.k:

module FRUITS
  syntax Fruit ::= Banana()
                 | Blueberry()
endmodule

colorOf.k:

requires "fruits.k"
requires "colors.k"

module COLOROF-SYNTAX
  imports COLORS
  imports FRUITS

  syntax Color ::= colorOf(Fruit) [function]
endmodule

module COLOROF
  imports COLOROF-SYNTAX

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()
endmodule

You would then compile this definition with kompile colorOf.k and use it the
same way as the original, single-module definition.

Exercise

Modify the name of the COLOROF module, and then recompile the definition.
Try to understand why you now get a compiler error. Then, resolve this compiler
error by passing the --main-module and --syntax-module flags to kompile.

Include path

One note can be made about how paths are resolved in requires statements.

By default, the path you specify is allowed to be an absolute or a relative
path. If the path is absolute, that exact file is imported. If the path is
relative, a matching file is looked for within all of the
include directories specified to the compiler. By default, the include
directories include the current working directory, followed by the
include/kframework/builtin directory within your installation of K. You can
also pass one or more directories to kompile via the -I command line flag,
in which case these directories are prepended to the beginning of the list.

Exercises

Take the solution to Lesson 1.4, Exercise 2 which included the explicit
priority and associativity declarations, and modify the definition so that
the syntax of integers and brackets is in one module, the syntax of addition,
subtraction, and unary negation is in another module, and the syntax of
multiplication and division is in a third module. Make sure you can still parse
the same set of expressions as before. Place priority declarations in the main
module.
Modify lesson-02-d.k from Lesson 1.2 so that the rules and syntax are in
separate modules in separate files.
Place the file containing the syntax from Exercise 2 in another directory,
then recompile the definition. Observe why a compilation error occurs. Then
fix the compiler error by passing -I to kompile.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.6: Integers and Booleans.

Lesson 1.6: Integers and Booleans

The purpose of this lesson is to explain the two most basic types of builtin
sorts in K, the Int sort and the Bool sort, representing
arbitrary-precision integers and Boolean algebra.

Builtin sorts in K

K provides definitions of some useful sorts in
domains.md, found in the
include/kframework/builtin directory of the K installation. This file is
defined via a
Literate programming
style that we will discuss in a future lesson. We will not cover all of the
sorts found there immediately, however, this lesson discusses some of the
details surrounding integers and Booleans, as well as providing information
about how to look up more detailed knowledge about builtin functions in K's
documentation.

Booleans in K

The most basic builtin sort K provides is the Bool sort, representing
Boolean values (i.e., true and false). You have already seen how we were
able to create this type ourselves using K's parsing and disambiguation
features. However, in the vast majority of cases, we prefer instead to import
the version of Boolean algebra defined by K itself. Most simply, you can do
this by importing the module BOOL in your definition. For example
(lesson-06-a.k):

module LESSON-06-A
  imports BOOL

  syntax Fruit ::= Blueberry() | Banana()
  syntax Bool ::= isBlue(Fruit) [function]

  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false
endmodule

Here we have defined a simple predicate, i.e., a function returning a
Boolean value. We are now able to perform the usual Boolean operations of
and, or, and not over these values. For example (lesson-06-b.k):"

module LESSON-06-B
  imports BOOL

  syntax Fruit ::= Blueberry() | Banana()
  syntax Bool ::= isBlue(Fruit) [function]

  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false

  syntax Bool ::= isYellow(Fruit) [function]
                | isBlueOrYellow(Fruit) [function]

  rule isYellow(Banana()) => true
  rule isYellow(Blueberry()) => false

  rule isBlueOrYellow(F) => isBlue(F) orBool isYellow(F)
endmodule

In the above example, Boolean inclusive or is performed via the orBool
function, which is defined in the BOOL module. As a matter of convention,
many functions over builtin sorts in K are suffixed with the name of the
primary sort over which those functions are defined. This happens so that the
syntax of K does not (generally) conflict with the syntax of any other
programming language, which would make it harder to define that programming
language in K.

Exercise

Write a function isBlueAndNotYellow which computes the appropriate Boolean
expression. If you are unsure what the appropriate syntax is to use, you
can refer to the BOOL module in
domains.md. Add a term of
sort Fruit for which isBlue and isYellow both return true, and test that
the isBlueAndNotYellow function behaves as expected on all three Fruits.

Syntax Modules

For most sorts in domains.md, K defines more than one module that can be
imported by users. For example, for the Bool sort, K defines the BOOL
module that has previously already been discussed, but also provides the
BOOL-SYNTAX module. This module, unlike the BOOL module, only declares the
values true and false, but not any of the functions that operate over the
Bool sort. The rationale is that you may want to import this module into the
main syntax module of your definition in some cases, whereas you generally do
not want to do this with the version of the module that includes all the
functions over the Bool sort. For example, if you were defining the semantics
of C++, you might import BOOL-SYNTAX into the syntax module of your
definition, because true and false are part of the grammar of C++, but
you would only import the BOOL module into the main semantics module, because
C++ defines its own syntax for and, or, and not that is different from the
syntax defined in the BOOL module.

Here, for example, is how we might redefine our Boolean expression calculator
to use the Bool sort while maintaining an idiomatic structure of modules
and imports, for the first time including the rules to calculate the values of
expressions themselves (lesson-06-c.k):

module LESSON-06-C-SYNTAX
  imports BOOL-SYNTAX

  syntax Bool ::= "(" Bool ")" [bracket]
                > "!" Bool [function]
                > left:
                  Bool "&&" Bool [function]
                | Bool "^" Bool [function]
                | Bool "||" Bool [function]
endmodule

module LESSON-06-C
  imports LESSON-06-C-SYNTAX
  imports BOOL

  rule ! B => notBool B
  rule A && B => A andBool B
  rule A ^ B => A xorBool B
  rule A || B => A orBool B
endmodule

Note the encapsulation of syntax: the LESSON-06-C-SYNTAX module contains
exactly the syntax of our Boolean expressions, and no more, whereas any other
syntax needed to implement those functions is in the LESSON-06-C module
instead.

Exercise

Add an "implies" function to the above Boolean expression calculator, using the
-> symbol to represent implication. You can look up K's builtin "implies"
function in the BOOL module in domains.md.

Integers in K

Unlike most programming languages, where the most basic integer type is a
fixed-precision integer type, the most commonly used integer sort in K is
the Int sort, which represents the mathematical integers, ie,
arbitrary-precision integers.

K provides three main modules for import when using the Int sort. The first,
containing all the syntax of integers as well as all of the functions over
integers, is the INT module. The second, which provides just the syntax
of integer literals themselves, is the INT-SYNTAX module. However, unlike
most builtin sorts in K, K also provides a third module for the Int sort:
the UNSIGNED-INT-SYNTAX module. This module provides only the syntax of
non-negative integers, i.e., natural numbers. The reasons for this involve
lexical ambiguity. Generally speaking, in most programming languages, -1 is
not a literal, but instead a literal to which the unary negation operator is
applied. K thus provides this module to ease in specifying the syntax of such
languages.

For detailed information about the functions available over the Int sort,
refer to domains.md. Note again how we append Int to the end of most of the
integer operations to ensure they do not collide with the syntax of other
programming languages.

Exercises

Extend your solution from Lesson 1.4, Exercise 2 to implement the rules
that define the behavior of addition, subtraction, multiplication, and
division. Do not worry about the case when the user tries to divide by zero
at this time. Use /Int to implement division. Test your new calculator
implementation by executing the arithmetic expressions you wrote as part of
Lesson 1.3, Exercise 2. Check to make sure each computes the value you expected.
Combine the Boolean expression calculator from this lesson with your
solution to Exercise 1, and then extend the combined calculator with the <,
<=, >, >=, ==, and != expressions. Write some Boolean expressions
that combine integer and Boolean operations, and test to ensure that these
expressions return the expected truth value.
Compute the following expressions using your solution from Exercise 2:
7 / 3, 7 / -3, -7 / 3, -7 / -3. Then replace the /Int function in
your definition with divInt instead, and observe how the value of the above
expressions changes. Why does this occur?

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.7: Side Conditions and Rule Priority.

Lesson 1.7: Side Conditions and Rule Priority

The purpose of this lesson is to explain how to write conditional rules in K,
and to explain how to control the order in which rules are tried.

Side Conditions

So far, all of the rules we have discussed have been unconditional rules.
If the left-hand side of the rule matches the arguments to the function, the
rule applies. However, there is another type of rule, a conditional rule.
A conditional rule consists of a rule body containing the patterns to
match, and a side condition representing a Boolean expression that must
evaluate to true in order for the rule to apply.

Side conditions in K are introduced via the requires keyword immediately
following the rule body. For example, here is a rule with a side condition
(lesson-07-a.k):

module LESSON-07-A
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90
endmodule

In this case, the gradeFromPercentile function takes a single integer
argument. The function evaluates to letter-A if the argument passed is
greater than 90. Note that the side condition is allowed to refer to variables
that appear on the left-hand side of the rule. In the same manner as variables
appearing on the right-hand side, variables that appear in the side condition
evaluate to the value that was matched on the left-hand side. Then the
functions in the side condition are evaluated, which returns a term of sort
Bool. If the term is equal to true, then the rule applies. Bear in mind
that the side condition is only evaluated at all if the patterns on the
left-hand side of the rule match the term being evaluated.

Exercise

Write a rule that evaluates gradeFromPercentile to letter-B if the argument
to the function is in the range [80,90). Test that the function correctly
evaluates various numbers between 80 and 100.

`owise` Rules

So far, all the rules we have introduced have had the same priority. What
this means is that K does not necessarily enforce an order in which the rules
are tried. We have only discussed functions so far in K, so it is not
immediately clear why this choice was made, given that a function is not
considered well-defined if multiple rules for evaluating it are capable of
evaluating the same arguments to different results. However, in future lessons
we will discuss other types of rules in K, some of which can be
non-deterministic. What this means is that if more than one rule is capable
of matching, then K will explore both possible rules in parallel, and consider
each of their respective results when executing your program. Don't worry too
much about this right now, but just understand that because of the potential
later for nondeterminism, we don't enforce a total ordering on the order in
which rules are attempted to be applied.

However, sometimes this is not practical; It can be very convenient to express
that a particular rule applies if no other rules for that function are
applicable. This can be expressed by adding the owise attribute to a rule.
What this means, in practice, is that this rule has lower priority than other
rules, and will only be tried to be applied after all the other,
higher-priority rules have been tried and they have failed.

For example, in the above exercise, we had to add a side condition containing
two Boolean comparisons to the rule we wrote to handle letter-B grades.
However, in practice this meant that we compare the percentile to 90 twice. We
can more efficiently and more idiomatically write the letter-B case for the
gradeFromPercentile rule using the owise attribute (lesson-07-b.k):

module LESSON-07-B
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [owise]
endmodule

This rule is saying, "if all the other rules do not apply, then the grade is a
B if the percentile is greater than or equal to 80." Note here that we use both
a side condition and an owise attribute on the same rule. This is not
required (as we will see later), but it is allowed. What this means is that the
side condition is only tried if the other rules did not apply and the
left-hand side of the rule matched. You can even use more complex matching on
the left-hand side than simply a variable. More generally, you can also have
multiple higher-priority rules, or multiple owise rules. What this means in
practice is that all of the non-owise rules are tried first, in any order,
followed by all the owise rules, in any order.

Exercise

The grades D and F correspond to the percentile ranges [60, 70) and [0, 60)
respectively. Write another implementation of gradeFromPercentile which
handles only these cases, and uses the owise attribute to avoid redundant
Boolean comparisons. Test that various percentiles in the range [0, 70) are
evaluated correctly.

Rule Priority

As it happens, the owise attribute is a specific case of a more general
concept we call rule priority. In essence, each rule is assigned an integer
priority. Rules are tried in increasing order of priority, starting with a
rule with priority zero, and trying each increasing numerical value
successively.

By default, a rule is assigned a priority of 50. If the rule has the owise
attribute, it is instead given the priority 200. You can see why this will
cause owise rules to be tried after regular rules.

However, it is also possible to directly assign a numerical priority to a rule
via the priority attribute. For example, here is an alternative way
we could express the same two rules in the gradeFromPercentile function
(lesson-07-c.k):

module LESSON-07-C
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)]
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(200)]
endmodule

We can, of course, assign a priority equal to any non-negative integer. For
example, here is a more complex example that handles the remaining grades
(lesson-07-d.k):

module LESSON-07-D
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)]
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(51)]
  rule gradeFromPercentile(I) => letter-C requires I >=Int 70 [priority(52)]
  rule gradeFromPercentile(I) => letter-D requires I >=Int 60 [priority(53)]
  rule gradeFromPercentile(_) => letter-F                     [priority(54)]
endmodule

Note that we have introduced a new piece of syntax here: _. This is actually
just a variable. However, as a special case, when a variable is named _, it
does not bind a value that can be used on the right-hand side of the rule, or
in a side condition. Effectively, _ is a placeholder variable that means "I
don't care about this term."

In this example, we have explicitly expressed the order in which the rules of
this function are tried. Since rules are tried in increasing numerical
priority, we first try the rule with priority 50, then 51, then 52, 53, and
finally 54.

As a final note, remember that if you assign a rule a priority higher than 200,
it will be tried after a rule with the owise attribute, and if you assign
a rule a priority less than 50, it will be tried before a rule with no
explicit priority.

Exercises

Write a function isEven that returns whether an integer is an even number.
Use two rules and one side condition. The right-hand side of the rules should
be Boolean literals. Refer back to
domains.md for the relevant
integer operations.
Modify the calculator application from Lesson 1.6, Exercise 2, so that division
by zero will no longer make krun crash with a "Divison by zero" exception.
Instead, the / function should not match any of its rules if the denominator
is zero.
Write your own implementation of ==, <, <=, >, >= for integers and modify your solution from Exercise 2 to use it.
You can use any arithmetic operations in the INT module, but do not use any built-in boolean functions for comparing integers.

Hint: Use pattern matching and recursive definitions with rule priorities.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.8: Literate Programming with Markdown.

Lesson 1.8: Literate Programming with Markdown

The purpose of this lesson is to teach a paradigm for performing literate
programming in K, and explain how this can be used to create K definitions
that are also documentation.

Markdown and K

The K tutorial so far has been written in
Markdown. Markdown,
for those not already familiar, is a lightweight plain-text format for styling
text. From this point onward, we assume you are familiar with Markdown and how
to write Markdown code. You can refer to the above link for a tutorial if you
are not already familiar.

What you may not necessarily realize, however, is that the K tutorial is also
a sequence of K definitions written in the manner of
Literate Programming.
For detailed information about Literate Programming, you can read the linked
Wikipedia article, but the short summary is that literate programming is a way
of intertwining documentation and code together in a manner that allows
executable code to also be, simultaneously, a documented description of that
code.

K is provided with built-in support for literate programming using Markdown.
By default, if you pass a file with the .md file extension to kompile, it
will look for any code blocks containing k code in that file, extract out
that K code into pure K, and then compile it as if it were a .k file.

A K code block begins with a line of text containing the keyword ```k,
and ends when it encounters another ``` keyword.

For example, if you view the markdown source of this document, this is a K
code block:

module LESSON-08
  imports INT

Only the code inside K code blocks will actually be sent to the compiler. The
rest, while it may appear in the document when rendered by a markdown viewer,
is essentially a form of code comment.

When you have multiple K code blocks in a document, K will append each one
together into a single file before passing it off to the outer parser.

For example, the following code block contains sentences that are part of the
LESSON-08 module that we declared the beginning of above:

  syntax Int ::= Int "+" Int [function]
  rule I1 + I2 => I1 +Int I2

Exercise

Compile this file with kompile README.md --main-module LESSON-08. Confirm
that you can use the resulting compiled definition to evaluate the +
function.

Markdown Selectors

On occasion, you may want to generate multiple K definitions from a single
Markdown file. You may also wish to include a block of syntax-highlighted K
code that nonetheless does not appear as part of your K definition. It is
possible to accomplish this by means of the built-in support for syntax
highlighting in Markdown. Markdown allows a code block that was begun with
``` to be immediately followed by a string which is used to signify what
programming language the following code is written in. However, this feature
actually allows arbitrary text to appear describing that code block. Markdown
parsers are able to parse this text and render the code block differently
depending on what text appears after the backticks.

In K, you can use this functionality to specify one or more
Markdown selectors which are used to describe the code block. A Markdown
selector consists of a sequence of characters containing letters, numbers, and
underscores. A code block can be designated with a single selector by appending
the selector immediately following the backticks that open the code block.

For example, here is a code block with the foo selector:

foo bar

Note that this is not K code. By convention, K code should have the k
selector on it. You can express multiple selectors on a code block by putting
them between curly braces and prepending each with the . character. For
example, here is a code block with the foo and k selectors:

  syntax Int ::= foo(Int) [function]
  rule foo(0) => 0

Because this code block contains the k Markdown selector, by default it is
included as part of the K definition being compiled.

Exercise

Confirm this fact by using krun to evaluate foo(0).

Markdown Selector Expressions

By default, as previously stated, K includes in the definition any code block
with the k selector. However, this is merely a specific instance of a general
principle, namely, that K allows you to control which selectors get included
in your K definition. This is done by means of the --md-selector flag to
kompile. This flag accepts a Markdown selector expression, which you
can essentially think of as a kind of Boolean algebra over Markdown selectors.
Each selector becomes an atom, and you can combine these atoms via the &,
|, !, and () operators.

Here is a grammar, written in K, of the language of Markdown selector
expressions:

  syntax Selector ::= r"[0-9a-zA-Z_]+" [token]
  syntax SelectorExp ::= Selector
                       | "(" SelectorExp ")" [bracket]
                       > right:
                         "!" SelectorExp
                       > right:
                         SelectorExp "&" SelectorExp
                       > right:
                         SelectorExp "|" SelectorExp

Here is a selector expression that selects all the K code blocks in this
definition except the one immediately above:

k & (! selector)

Addendum

This code block exists in order to make the above lesson a syntactically valid
K definition. Consider why it is necessary.

endmodule

Exercises

Compile this lesson with the selector expression k & (! foo) and confirm
that you get a parser error if you try to evaluate the foo function with the
resulting definition.
Compile Lesson 1.3
as a K definition. Identify why it fails to compile. Then pass an appropriate
--md-selector to the compiler in order to make it compile.
Modify your calculator application from Lesson 1.7, Exercise 2, to be written
in a literate style. Consider what text might be appropriate to turn the
resulting markdown file into documentation for your calculator.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.9: Unparsing and the format and color attributes.

Lesson 1.9: Unparsing and the format and color attributes

The purpose of this lesson is to teach the user about how terms are
pretty-printed in K, and how the user can make adjustments to the default
settings for how to print specific terms.

Parsing, Execution, and Unparsing

When you use krun to interpret a program, the tool passes through three major
phases. In the first, parsing, the program itself is parsed using either kast
or an ahead-of-time parser generated via Bison, and the resulting AST becomes
the input to the interpreter. In the second phase, execution, K evaluates
functions and (as we will discuss in depth later) performs rewrite steps to
iteratively transform the program state. The third and final phase is called
unparsing, because it consists of taking the final state of the application
after the program has been interpreted, and converting it from an AST back into
text that (in theory, anyway) could be parsed back into the same AST that was
the output of the execution phase.

In practice, parsing is not always precisely reversible. It turns out
(although we are not going to cover exactly why this is here), that
constructing a sound algorithm that takes a grammar and an AST and emits text
that could be parsed via that grammar to the original AST is an
NP-hard problem. As a result, in the interests of avoiding exponential time
algorithms when users rarely care about unparsing being completely sound, we
take certain shortcuts that provide a linear-time algorithm that approximates
a sound solution to the problem while sacrificing the notion that the result
can be parsed into the exact original term in all cases.

This is a lot of theoretical explanation, but at root, the unparsing process
is fairly simple: it takes a K term that is the output of execution and pretty
prints it according to the syntax defined by the user in their K definition.
This is useful because the original AST is not terribly user-readable, and it
is difficult to visualize the entire term or decipher information about the
final state of the program at a quick glance. Of course, in rare cases, the
pretty-printed configuration loses information of relevance, which is why K
allows you to obtain the original AST on request.

As an example of all of this, consider the following K definition
(lesson-09-a.k):

module LESSON-09-A
  imports BOOL

  syntax Exp ::= "(" Exp ")" [bracket]
               | Bool
               > "!" Exp
               > left:
                 Exp "&&" Exp
               | Exp "^" Exp
               | Exp "||" Exp

  syntax Exp ::= id(Exp) [function]
  rule id(E) => E
endmodule

This is similar to the grammar we defined in LESSON-06-C, with the difference
that the Boolean expressions are now constructors of sort Exp and we define a
trivial function over expressions that returns its argument unchanged.

We can now parse a simple program in this definition and use it to unparse some
Boolean expressions. For example (exp.bool):

id(true&&false&&!true^(false||true))

Here is a program that is not particularly legible at first glance, because all
extraneous whitespace has been removed. However, if we run krun exp.bool, we
see that the result of the unparser will pretty-print this expression rather
nicely:

<k>
  true && false && ! true ^ ( false || true ) ~> .
</k>

Notably, not only does K insert whitespace where appropriate, it is also smart
enough to insert parentheses where necessary in order to ensure the correct
parse. For example, without those parentheses, the expression above would parse
equivalent to the following one:

(((true && false) && ! true) ^ false) || true

Indeed, you can confirm this by passing that exact expression to the id
function and evaluating it, then looking at the result of the unparser:

<k>
  true && false && ! true ^ false || true ~> .
</k>

Here, because the meaning of the AST is the same both with and without
parentheses, K does not insert any parentheses when unparsing.

Exercise

Modify the grammar of LESSON-09-A above so that the binary operators are
right associative. Try unparsing exp.bool again, and note how the result is
different. Explain the reason for the difference.

Custom unparsing of terms

You may have noticed that right now, the unparsing of terms is not terribly
imaginative. All it is doing is taking each child of the term, inserting it
into the non-terminal positions of the production, then printing the production
with a space between each terminal or non-terminal. It is easy to see why this
might not be desirable in some cases. Consider the following K definition
(lesson-09-b.k):

module LESSON-09-B
  imports BOOL

  syntax Stmt ::= "{" Stmt "}" | "{" "}"
                > right:
                  Stmt Stmt
                | "if" "(" Bool ")" Stmt
                | "if" "(" Bool ")" Stmt "else" Stmt [avoid]
endmodule

This is a statement grammar, simplified to the point of meaninglessness, but
still useful as an object lesson in unparsing. Consider the following program
in this grammar (if.stmt):

if (true) {
  if (true) {}
  if (false) {}
  if (true) {
    if (false) {} else {}
  } else {
    if (false) {}
  }
}

This is how that term would be unparsed if it appeared in the output of krun:

if ( true ) { if ( true ) { } if ( false ) { } if ( true ) { if ( false ) { } else { } } else { if ( false ) { } } }

This is clearly much less legible than we started with! What are we to do?
Well, K provides an attribute, format, that can be applied to any production,
which controls how that production gets unparsed. You've seen how it gets
unparsed by default, but via this attribute, the developer has complete control
over how the term is printed. Of course, the user can trivially create ways to
print terms that would not parse back into the same term. Sometimes this is
even desirable. But in most cases, what you are interested in is controlling
the line breaking, indentation, and spacing of the production.

Here is an example of how you might choose to apply the format attribute
to improve how the above term is unparsed (lesson-09-c.k):

module LESSON-09-C
  imports BOOL

  syntax Stmt ::= "{" Stmt "}" [format(%1%i%n%2%d%n%3)] | "{" "}" [format(%1%2)]
                > right:
                  Stmt Stmt [format(%1%n%2)]
                | "if" "(" Bool ")" Stmt [format(%1 %2%3%4 %5)]
                | "if" "(" Bool ")" Stmt "else" Stmt [avoid, format(%1 %2%3%4 %5 %6 %7)]
endmodule

If we compile this new definition and unparse the same term, this is the
result we get:

if (true) {
  if (true) {}
  if (false) {}
  if (true) {
    if (false) {} else {}
  } else {
    if (false) {}
  }
}

This is the exact same text we started with! By adding the format attributes,
we were able to indent the body of code blocks, adjust the spacing of if
statements, and put each statement on a new line.

How exactly was this achieved? Well, each time the unparser reaches a term,
it looks at the format attribute of that term. That format attribute is a
mix of characters and format codes. Format codes begin with the %
character. Each character in the format attribute other than a format code is
appended verbatim to the output, and each format code is handled according to
its meaning, transformed (possibly recursively) into a string of text, and
spliced into the output at the position the format code appears in the format
string.

Provided for reference is a table with a complete list of all valid format
codes, followed by their meaning:

Format Code	Meaning
n	Insert '\n' followed by the current indentation level
i	Increase the current indentation level by 1
d	Decrease the current indentation level by 1
c	Move to the next color in the list of colors for this production (see next section)
r	Reset color to the default foreground color for the terminal (see next section)
an integer	Print a terminal or non-terminal from the production. The integer is treated as a 1-based index into the terminals and non-terminals of the production. If the offset refers to a terminal, move to the next color in the list of colors for this production, print the value of that terminal, then reset the color to the default foreground color for the terminal. If the offset refers to a regular expression terminal, it is an error. If the offset refers to a non-terminal, unparse the corresponding child of the current term (starting with the current indentation level) and print the resulting text, then set the current color and indentation level to the color and indentation level following unparsing that term.
other char	Print that character verbatim

Exercise

Change the format attributes for LESSON-09-C so that if.stmt will unparse
as follows:

if (true)
{
  if (true)
  {
  }
  if (false)
  {
  }
  if (true)
  {
    if (false)
    {
    }
    else
    {
    }
  }
  else
  {
    if (false)
    {
    }
  }
}

Output coloring

When the output of unparsing is displayed on a terminal supporting colors, K
is capable of coloring the output, similar to what is possible with a syntax
highlighter. This is achieved via the color and colors attributes.

Essentially, both the color and colors attributes are used to construct a
list of colors associated with each production, and then the format attribute
is used to control how those colors are used to unparse the term. At its most
basic level, you can set the color attribute to color all the terminals in
the production a certain color, or you can use the colors attribute to
specify a comma-separated list of colors for each terminal in the production.
At a more advanced level, the %c and %r format codes control how the
formatter interacts with the list of colors specified by the colors
attribute. You can essentially think of the color attribute as a way of
specifying that you want all the colors in the list to be the same color.

Note that the %c and %r format codes are relatively primitive in nature.
The color and colors attributes merely maintain a list of colors, whereas
the %c and %r format codes merely control how to advance through that list
and how individual text is colored.

It is an error if the colors attribute does not provide all the colors needed
by the terminals and escape codes in the production. %r does not change the
position in the list of colors at all, so the next %c will advance to the
following color.

As a complete example, here is a variant of LESSON-09-A which colors the
various boolean operators:

module LESSON-09-D
  imports BOOL

  syntax Exp ::= "(" Exp ")" [bracket]
               | Bool
               > "!" Exp [color(yellow)]
               > left:
                 Exp "&&" Exp [color(red)]
               | Exp "^" Exp [color(blue)]
               | Exp "||" Exp [color(green)]

  syntax Exp ::= id(Exp) [function]
  rule id(E) => E
endmodule

For a complete list of allowed colors, see
here.

Exercises

Use the color attribute on LESSON-09-C to color the keywords true and
false one color, the keywords if and else another color, and the operators
(, ), {, and } a third color.
Use the format, color, and colors attributes to tell the unparser to
style the expression grammar from Lesson 1.8, Exercise 3 according to your own
personal preferences for syntax highlighting and code formatting. You can
view the result of the unparser on a function term without evaluating that
function by means of the command kparse <file> | kore-print -.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.10: Strings.

Lesson 1.10: Strings

The purpose of this lesson is to explain how to use the String sort in K to
represent sequences of characters, and explain where to find additional
information about builtin functions over strings.

The `String` Sort

In addition to the Int and Bool sorts covered in
Lesson 1.6, K provides, among others, the
String sort to represent sequences of characters. You can import this
functionality via the STRING-SYNTAX module, which contains the syntax of
string literals in K, and the STRING module, which contains all the functions
that operate over the String type.

Strings in K are double-quoted. The following list of escape sequences is
supported:

Escape Sequence	Meaning
`\"`	The literal character "
`\\`	The literal character \
`\n`	The newline character (ASCII code 0x0a)
`\r`	The carriage return character (ASCII code 0x0d)
`\t`	The tab character (ASCII code 0x09)
`\f`	The form feed character (ASCII code 0x0c)
`\x00`	\x followed by 2 hexadecimal digits indicates a code point between 0x00 and 0xFF
`\u0000`	\u followed by 4 hexadecimal digits indicates a code point between 0x0000 and 0xFFFF
`\U00000000`	\U followed by 8 hexadecimal digits indicates a code point between 0x000000 and 0x10FFFF

Please note that as of the current moment, K's unicode support is not fully
complete, so you may run into errors using code points greater than 0xff.

As an example, you can construct a string literal containing the following
block of text:

This is an example block of text.
Here is a quotation: "Hello world."
	This line is indented.
ÁÉÍÓÚ

Like so:

"This is an example block of text.\nHere is a quotation: \"Hello world.\"\n\tThis line is indented.\n\xc1\xc9\xcd\xd3\xda\n"

Basic String Functions

The full list of functions provided for the String sort can be found in
domains.md, but here we
describe a few of the more basic ones.

String concatenation

The concatenation operator for strings is +String. For example, consider
the following K rule that constructs a string from component parts
(lesson-10.k):

module LESSON-10
  imports STRING

  syntax String ::= msg(String) [function]
  rule msg(S) => "The string you provided: " +String S +String "\nHave a nice day!"
endmodule

Note that this operator is O(N), so repeated concatenations are inefficient.
For information about efficient string concatenation, refer to
Lesson 2.14.

String length

The function to return the length of a string is lengthString. For example,
lengthString("foo") will return 3, and lengthString("") will return 0.
The return value is the length of the string in code points.

Substring computation

The function to compute the substring of a string is substrString. It
takes two string indices, starting from 0, and returns the substring within the
range [start..end). It is only defined if end >= start, start >= 0, and
end <= length of string. Here, for example, we return the first 5 characters
of a string:

substrString(S, 0, 5)

Here we return all but the first 3 characters:

substrString(S, 3, lengthString(S))

Exercises

Write a function that takes a paragraph of text (i.e., a sequence of
sentences, each ending in a period), and constructs a new (nonsense) sentence
composed of the first word of each sentence, followed by a period. Do not
worry about capitalization or periods within the sentence which do not end the
sentence (e.g. "Dr."). You can assume that all whitespace within the paragraph
are spaces. For more information about the functions over strings required to
implement such a function, refer to domains.md.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.11: Casting Terms.

Lesson 1.11: Casting Terms

The purpose of this lesson is to explain how to use cast expressions in
order to disambiguate terms using sort information. We also explain how the
variable sort inference algorithm works in K, and how to change the default
behavior by casting variables to a particular sort.

Casting in K

Sometimes the grammar you write for your rules in K can be a little bit
ambiguous on purpose. While grammars for programming languages may be
unambiguous when considered in their entirety, K allows you to write rules
involving arbitrary fragments of that grammar, and those fragments can
sometimes be ambiguous by themselves, or similar enough to other fragments
of the grammar to trigger ambiguity. As a result, in addition to the tools
covered in Lesson 1.4, K provides one
additional powerful tool for disambiguation: cast expressions.

K provides three main types of casts: the semantic cast, the strict cast, and
the projection cast. We will cover each of them, and their similarities and
differences, in turn.

Semantic casts

The most basic, and most common, type of cast in K is called the
semantic cast. For every sort S declared in a module, K provides the
following (implicit) production for use in sentences:

  syntax S ::= S ":S"

Note that S simply represents the name of the sort. For example, if we
defined a sort Exp, the actual production for that sort would be:

  syntax Exp ::= Exp ":Exp"

At runtime, this expression will not actually exist; it is merely an annotation
to the compiler describing the sort of the term inside the cast. It is telling
the compiler that the term inside the cast must be of sort Exp. For example,
if we had the following grammar:

module LESSON-11-A
  imports INT

  syntax Exp ::= Int | Exp "+" Exp
  syntax Stmt ::= "if" "(" Exp ")" Stmt | "{" "}"
endmodule

Then we would be able to write 1:Exp, or (1 + 2):Exp, but not {}:Exp.

You can also restrict the sort that a variable in a rule will match by casting
it. For example, consider the following additional module:

module LESSON-11-B
  imports LESSON-11-A
  imports BOOL

  syntax Term ::= Exp | Stmt
  syntax Bool ::= isExpression(Term) [function]

  rule isExpression(_E:Exp) => true
  rule isExpression(_) => false [owise]
endmodule

Here we have defined a very simple function that decides whether a term is
an expression or a statement. It does this by casting the variable inside the
isExpression rule to sort Exp. As a result, that variable will only match terms
of sort Exp. Thus, isExpression(1) will return true, as will isExpression(1 + 2), but
isExpression({}) will return false.

Exercise

Verify this fact for yourself by running isExpression on the above examples. Then
write an isStatement function, and test that it works as expected.

Strict casts

On occasion, a semantic cast is not strict enough. It might be that you want
to, for disambiguation purposes, say exactly what sort a term is. For
example, consider the following definition:

module LESSON-11-C
  imports INT

  syntax Exp ::= Int
               | "add[" Exp "," Exp "]"   [group(exp)]
  syntax Exp2 ::= Exp
               | "add[" Exp2 "," Exp2 "]" [group(exp2)]
endmodule

This grammar is a little ambiguous and contrived, but it serves to demonstrate
how a semantic cast might be insufficient to disambiguate a term. If we were
to write the term add[ I1:Int , I2:Int ]:Exp2, the term would be ambiguous,
because the cast is not sufficiently strict to determine whether you mean
to derive the "add" production defined in group exp or the one in group exp2.

In this situation, there is a solution: the strict cast. For every sort
S in your grammar, K also defines the following production:

  syntax S ::= S "::S"

This may at first glance seem the same as the previous cast. And indeed,
from the perspective of the grammar and from the perspective of rewriting,
they are in fact identical. However, the second variant has a unique meaning
in the type system of K: namely, the term inside the cast cannot be a
subsort, i.e., a term of another sort S2 such that the production
syntax S ::= S2 exists.

As a result, if we were to write in the above grammar the term
add[ I1:Int , I2:Int ]::Exp2, then we would know that the second derivation above
should be chosen, whereas if we want the first derivation, we could write
add[ I1:Int , I2:Int ]::Exp.

Care must be taken when using a strict cast with brackets. For example, consider a
similar grammar but using an infix "+":

module LESSON-11-D
  imports INT

  syntax Exp ::= Int
               | Exp "+" Exp   [group(exp)]
  syntax Exp2 ::= Exp
               | Exp2 "+" Exp2 [group(exp2)]
               | "(" Exp2 ")"  [bracket]
endmodule

The term I1:Int + I2:Int is ambiguous and could refer to either the production
in group exp or the one in group exp2. To differentiate, you might try to write
(I1:Int + I2:Int)::Exp2 similarly to the previous example.

Unfortunately though, this is still ambiguous. Here, the strict cast ::Exp2 applies
directly to the brackets themselves rather than the underlying term within those brackets.
As a result, it enforces that (I1:Int + I2:Int) cannot be a strict subsort of Exp2, but
it has no effect on the sort of the subterm I1:Int + I2:Int.

For cases like this, K provides an alternative syntax for strict casts:

  syntax S ::= "{" S "}::S"

The ambiguity can then be resolved with {I1:Int + I2:Int}::Exp or {I1:Int + I2:Int}::Exp2.

Projection casts

Thus far we have focused entirely on casts which exist solely to inform the
compiler about the sort of terms. However, sometimes when dealing with grammars
containing subsorts, it can be desirable to reason with the subsort production
itself, which injects one sort into another. Remember from above that such
a production looks like syntax S ::= S2. This type of production, called a
subsort production, can be thought of as a type of inheritance involving
constructors. If we have the above production in our grammar, we say that S2
is a subsort of S, or that any S2 is also an S. K implicitly maintains a
symbol at runtime which keeps track of where such subsortings occur; this
symbol is called an injection.

Sometimes, when one sort is a subsort of another, it can be the case that
a function returns one sort, but you actually want to cast the result of
calling that function to another sort which is a subsort of the first sort.
This is similar to what happens with inheritance in an object-oriented
language, where you might cast a superclass to a subclass if you know for
sure the object at runtime is in fact an instance of that class.

K provides something similar for subsorts: the projection cast.

For each pair of sorts S and S2, K provides the following production:

  syntax S ::= "{" S2 "}" ":>S"

What this means is that you take any term of sort S2 and cast it to sort
S. If the term of sort S2 consists of an injection containing a term of sort
S, then this will return that term. Otherwise, an error occurs and rewriting
fails, returning the projection function which failed to apply. The sort is
not actually checked at compilation time; rather, it is a runtime check
inserted into the code that runs when the rule applies.

For example, here is a module that makes use of projection casts:

module LESSON-11-E
  imports INT
  imports BOOL

  syntax Exp ::= Int | Bool | Exp "+" Exp | Exp "&&" Exp

  syntax Exp ::= eval(Exp) [function]
  rule eval(I:Int) => I
  rule eval(B:Bool) => B
  rule eval(E1 + E2) => {eval(E1)}:>Int +Int {eval(E2)}:>Int
  rule eval(E1 && E2) => {eval(E1)}:>Bool andBool {eval(E2)}:>Bool
endmodule

Here we have defined constructors for a simple expression language over
Booleans and integers, as well as a function eval that evaluates these
expressions to a value. Because that value could be an integer or a Boolean,
we need the casts in the last two rules in order to meet the type signature of
+Int and andBool. Of course, the user can write ill-formed expressions like
1 && true or false + true, but these will cause errors at runtime, because
the projection cast will fail.

Exercises

Extend the eval function in LESSON-11-E to include Strings and add a .
operator which concatenates them.
Modify your solution from Lesson 1.9, Exercise 2 by using an Exp sort to
express the integer and Boolean expressions that it supports, in the same style
as LESSON-11-E. Then write an eval function that evaluates all terms of
sort Exp to either a Bool or an Int.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.12: Syntactic Lists.

Lesson 1.12: Syntactic Lists

The purpose of this lesson is to explain how K provides support for syntactic
repetition through the use of the List{} and NeList{} constructs,
generally called syntactic lists.

The `List{}` construct

Sometimes, when defining a grammar in K, it is useful to define a syntactic
construct consisting of an arbitrary-length sequence of items. For example,
you might wish to define a function call construct, and need to express a way
of passing arguments to the function. You can in theory simply define these
productions using ordinary constructors, but it can be tricky to get the syntax
exactly right in K without a lot of tedious glue code.

For this reason, K provides a way of specifying that a non-terminal represents
a syntactic list (lesson-12-a.k):

module LESSON-12-A-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= List{Int,","}
endmodule

module LESSON-12-A
  imports LESSON-12-A-SYNTAX
endmodule

Note that instead of a sequence of terminals and non-terminals, the right hand
side of the Ints production contains the symbol List followed by two items
in curly braces. The first item is the non-terminal which is the element type
of the list, and the second item is a terminal representing the separator of
the list. As a special case, lists which are separated only by whitespace can
be specified with a separator of "".

This List{} construct is roughly equivalent to the following definition
(lesson-12-b.k):

module LESSON-12-B-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= Int "," Ints | ".Ints"
endmodule

module LESSON-12-B
  imports LESSON-12-B-SYNTAX
endmodule

As you can see, the List{} construct represents a cons-list with an element
at the head and another list at the tail. The empty list is represented by
a . followed by the sort of the list.

However, the List{} construct provides several key syntactic conveniences
over the above definition. First of all, when writing a list in a rule,
explicitly writing the terminator is not always required. For example, consider
the following additional module (lesson-12-c.k):

module LESSON-12-C
  imports LESSON-12-A
  imports INT

  syntax Int ::= sum(Ints) [function]
  rule sum(I:Int) => I
  rule sum(I1:Int, I2:Int, Is:Ints) => sum(I1 +Int I2, Is)
endmodule

Here we see a function that sums together a non-empty list of integers. Note in
particular the first rule. We do not explicitly mention .Ints, but in fact,
the rule in question is equivalent to the following rule:

  rule sum(I:Int, .Ints) => I

The reason for this is that K will automatically insert a list terminator
anywhere a syntactic list is expected, but an element of that list appears
instead. This works even with lists of more than one element:

  rule sum(I1:Int, I2:Int) => I1 +Int I2

This rule is redundant, but here we explicitly match a list of exactly two
elements, because the .Ints is implicitly added after I2.

Parsing Syntactic Lists in Programs

An additional syntactic convenience takes place when you want to express a
syntactic list in the input to krun. In this case, K will automatically
transform the grammar in LESSON-12-B-SYNTAX into the following
(lesson-12-d.k):

module LESSON-12-D
  imports INT-SYNTAX

  syntax Ints ::= #NonEmptyInts | #IntsTerminator
  syntax #NonEmptyInts ::= Int "," #NonEmptyInts
                         | Int #IntsTerminator
  syntax #IntsTerminator ::= ""
endmodule

This allows you to express the usual comma-separated list of arguments where
an empty list is represented by the empty string, and you don't have to
explicitly terminate the list. Because of this, we can write the syntax
of function calls in C very easily (lesson-12-e.k):

module LESSON-12-E
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id | Exp "(" Exps ")"
  syntax Exps ::= List{Exp,","}
endmodule

Exercise

Write a function concat which takes a list of String and concatenates them
all together. Do not worry if the function is O(n^2).
Test your implementation using the syntactic sugar for lists added by the parser.

Then write some function call expressions using identifiers in C and verify with
kast that the above grammar captures the intended syntax. Make sure to test
with function calls with zero, one, and two or more arguments.

The `NeList{}` construct

One limitation of the List{} construct is that it is always possible to
write a list of zero elements where a List{} is expected. While this is
desirable in a number of cases, it is sometimes not what the grammar expects.

For example, in C, it is not allowable for an enum definition to have zero
members. In other words, if we were to write the grammar for enumerations like
so (lesson-12-f.k):

module LESSON-12-F
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id

  syntax EnumSpecifier ::= "enum" Id "{" Ids "}"
  syntax Ids ::= List{Id,","}
endmodule

Then we would be syntactically allowed to write enum X {}, which instead,
ought to be a syntax error.

For this reason, we introduce the additional NeList{} construct. The syntax
is identical to List{}, except with NeList instead of List before the
curly braces. When parsing rules, it behaves identically to the List{}
construct. However, when parsing inputs to krun, the above grammar, if we
replaced syntax Ids ::= List{Id,","} with syntax Ids ::= NeList{Id,","},
would become equivalent to the following (lesson-12-g.k):

module LESSON-12-G
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id

  syntax EnumSpecifier ::= "enum" Id "{" Ids "}"
  syntax Ids ::= Id | Id "," Ids
endmodule

In other words, only non-empty lists of Id would be allowed.

Exercises

Modify the sum function in LESSON-12-C so that the Ints sort is an
NeList{}. Verify that calling sum() with no arguments is now a syntax
error.
Write a modified sum function with the List construct that can also sum
up an empty list of arguments. In such a case, the sum ought to be 0.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.13: Basics of K Rewriting.

Lesson 1.13: Basics of K Rewriting

The purpose of this lesson is to explain how rewrite rules that are not the
definition of a function behave, and how, using these rules, you can construct
a semantics of programs in a programming language in K.

Recap: Function rules in K

Recall from Lesson 1.2 that we have, thus far,
introduced two types of productions in K: constructors and functions.
A function is identified by the function attribute placed on the
production. As you may recall, when we write a rule with a function on the
left-hand side of the => operator, we are defining the meaning of that
function for inputs which match the patterns on the left-hand side of the rule.
If the argument to the function match the patterns, then the function is
evaluated to the value constructed by substituting the bindings for the
variables into the right-hand side of the rule.

Top-level rules

However, function rules are not the only type of rule permissible in K, nor
even the most frequently used. K also has a concept of a
top-level rewrite rule. The simplest way to ensure that a rule is treated
as a top-level rule is for the left-hand side of the rule to mention one or
more cells. We will cover how cells work and are declared in more detail
in a later lesson, but for now, what you should know is that when we ran krun
in our very first example in Lesson 1.2 and got the following output:

<k>
  Yellow ( ) ~> .
</k>

<k> is a cell, known by convention as the K cell. This cell is available
by default in any definition without needing to be explicitly declared.

The K cell contains a single term of sort K. K is a predefined sort in K
with two constructors, that can be roughly represented by the following
grammar:

  syntax K ::= KItem "~>" K
             | "."

As a syntactic convenience, K allows you to treat ~> like it is an
associative list (i.e., as if it were defined as syntax K ::= K "~>" K).
When a definition is compiled, it will automatically transform the rules you
write so that they treat the K sort as a cons-list. Another syntactic
convenience is that, for disambiguation purposes, you can write .K anywhere
you would otherwise write . and the meaning is identical.

Now, you may notice that the above grammar mentions the sort KItem. This is
another built-in sort in K. For every sort S declared in a definition (with
the exception of K and KItem), K will implicitly insert the following
production:

  syntax KItem ::= S

In other words, every sort is a subsort of the sort KItem, and thus a term
of any sort can be injected as an element of a term of sort K, also called
a K sequence.

By default, when you krun a program, the AST of the program is inserted as
the sole element of a K sequence into the <k> cell. This explains why we
saw the output we did in Lesson 1.2.

With these preliminaries in mind, we can now explain how top-level rewrite
rules work in K. Put simply, any rule where there is a cell (such as the K
cell) at the top on the left-hand side will be a top-level rewrite rule. Once
the initial program has been inserted into the K cell, the resulting term,
called the configuration, will be matched against all the top-level
rewrite rules in the definition. If only one rule matches, the substitution
generated by the matching will be applied to the right-hand side of the rule
and the resulting term is rewritten to be the new configuration. Rewriting
proceeds by iteratively applying rules, also called taking steps, until
no top-level rewrite rule can be applied. At this point the configuration
becomes the final configuration and is output by krun.

If more than one top-level rule applies, by default, K will pick just one
of those rules, apply it, and continue rewriting. However, it is
non-deterministic which rule applies. In theory, it could be any of them.
By passing the --search flag to krun, you are able to tell krun to
explore all possible non-deterministic choices, and generate a complete list of
all possible final configurations reachable by each nondeterminstic choice that
can be made. Note that the --search flag to krun only works if you pass
--enable-search to kompile first.

Unlike top-level rewrite rules, function rules are not associated with any
particular set of cells in the configuration (although they can contain cells
in their function arguments and return value). While top-level rewrite rules
apply to the entire term being rewritten, function rules apply anywhere a
function application for that function appears, and are immediately rewritten
to their return value in that position.

Another key distinction between top-level rules and function rules is that
function symbols, i.e., productions with the function attribute, are
mathematical functions rather than constructors. While a constructor is
logically distinct from any other constructor of the same sort, and can be
matched against unconditionally, a function does not necessaraily have the
same restriction unless it happens to be an injective function. Thus, two
function symbols with different arguments may still ultimately produce the
same value and thus compare equal to one another. Due to this, concrete
execution (i.e., all K definitions introduced thus far; see Lesson 1.21)
introduces the restriction that you cannot match on a function symbol on the
left-hand side of a rule, except as the top symbol on the left-hand side of
a function rule. This restriction will be later lifted when we introduce the
Haskell Backend which performs symbolic execution.

Exercise

Pass a program containing no functions to krun. You can use a term of sort
Exp from LESSON-11-E. Observe the output and try to understand why you get
the output you do. Then write two rules that rewrite that program to another.
Run krun --search on that program and observe both results. Then add a third
rule that rewrites one of those results again. Test that that rule applies as
well.

Using top-level rules to evaluate expressions

Thus far, we have focused primarily on defining functions over constructors
in K. However, now that we have a basic understanding of top-level rules,
it is possible to introduce a rewrite system to our definitions. A rewrite
system is a collection of top-level rewrite rules which performs an organized
transformation of a particular program into a result which expresses the
meaning of that program. For example, we might rewrite an expression in a
programming language into a value representing the result of evaluating that
expression.

Recall in Lesson 1.11, we wrote a simple grammar of Boolean and integer
expressions that looked roughly like this (lesson-13-a.k):

module LESSON-13-A
  imports INT

  syntax Exp ::= Int
               | Bool
               | Exp "+" Exp
               | Exp "&&" Exp
endmodule

In that lesson, we defined a function eval which evaluated such expressions
to either an integer or Boolean.

However, it is more idiomatic to evaluate such expressions using top-level
rewrite rules. Here is how one might do so in K (lesson-13-b.k):

module LESSON-13-B-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Val ::= Int | Bool
  syntax Exp ::= Val
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-13-B
  imports LESSON-13-B-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k>
  rule <k> B1:Bool && B2:Bool ~> K:K </k> => <k> B1 andBool B2 ~> K </k>

  syntax KItem ::= freezer1(Val) | freezer2(Exp)
                 | freezer3(Val) | freezer4(Exp)

  rule <k> E1:Val + E2:Exp ~> K:K </k> => <k> E2 ~> freezer1(E1) ~> K </k> [priority(51)]
  rule <k> E1:Exp + E2:Exp ~> K:K </k> => <k> E1 ~> freezer2(E2) ~> K </k> [priority(52)]
  rule <k> E1:Val && E2:Exp ~> K:K </k> => <k> E2 ~> freezer3(E1) ~> K </k> [priority(51)]
  rule <k> E1:Exp && E2:Exp ~> K:K </k> => <k> E1 ~> freezer4(E2) ~> K </k> [priority(52)]

  rule <k> E2:Val ~> freezer1(E1) ~> K:K </k> => <k> E1 + E2 ~> K </k>
  rule <k> E1:Val ~> freezer2(E2) ~> K:K </k> => <k> E1 + E2 ~> K </k>
  rule <k> E2:Val ~> freezer3(E1) ~> K:K </k> => <k> E1 && E2 ~> K </k>
  rule <k> E1:Val ~> freezer4(E2) ~> K:K </k> => <k> E1 && E2 ~> K </k>
endmodule

This is of course rather cumbersome currently, but we will soon introduce
syntactic convenience which makes writing definitions of this type considerably
easier. For now, notice that there are roughly 3 types of rules here: the first
matches a K cell in which the first element of the K sequence is an Exp whose
arguments are values, and rewrites the first element of the sequence to the
result of that expression. The second also matches a K cell with an Exp in
the first element of its K sequence, but it matches when one or both arguments
of the Exp are not values, and replaces the first element of the K sequence
with two new elements: one being an argument to evaluate, and the other being
a special constructor called a freezer. Finally, the third matches a K
sequence where a Val is first, and a freezer is second, and replaces them
with a partially evaluated expression.

This general pattern is what is known as heating an expression,
evaluating its arguments, cooling the arguments into the expression
again, and evaluating the expression itself. By repeatedly performing
this sequence of actions, we can evaluate an entire AST containing a complex
expression down into its resulting value.

Exercise

Write an addition expression with integers. Use krun --depth 1 to see the
result of rewriting after applying a single top-level rule. Gradually increase
the value of --depth to see successive states. Observe how this combination
of rules is eventually able to evaluate the entire expression.

Simplifying the evaluator: Local rewrites and cell ellipses

As you saw above, the definition we wrote is rather cumbersome. Over the
remainder of Lessons 1.13 and 1.14, we will greatly simplify it. The first step
in doing so is to teach a bit more about the rewrite operator, =>. Thus far,
all the rules we have written look like rule LHS => RHS. However, this is not
the only way the rewrite operator can be used. It is actually possible to place
a constructor or function at the very top of the rule, and place rewrite
operators inside that term. While a rewrite operator cannot appear nested
inside another rewrite operator, by doing this, we can express that some parts
of what we are matching are not changed by the rewrite operator. For
example, consider the following rule from above:

  rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k>

We can equivalently write it like following:

  rule <k> (I1:Int + I2:Int => I1 +Int I2) ~> _:K </k>

When you put a rewrite inside a term like this, in essence, you are telling
the rule to only rewrite part of the left-hand side to the right-hand side.
In practice, this is implemented by lifting the rewrite operator to the top of
the rule by means of duplicating the surrounding context.

There is a way that the above rule can be simplified further, however. K
provides a special syntax for each cell containing a term of sort K, indicating
that we want to match only on some prefix of the K sequence. For example, the
above rule can be simplified further like so:

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>

Here we have placed the symbol ... immediately prior to the </k> which ends
the cell. What this tells the compiler is to take the contents of the cell,
treat it as the prefix of a K sequence, and insert an anonymous variable of
sort K at the end. Thus we can think of ... as a way of saying we
don't care about the part of the K sequence after the beginning, leaving
it unchanged.

Putting all this together, we can rewrite LESSON-13-B like so
(lesson-13-c.k):

module LESSON-13-C-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Val ::= Int | Bool
  syntax Exp ::= Val
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-13-C
  imports LESSON-13-C-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax KItem ::= freezer1(Val) | freezer2(Exp)
                 | freezer3(Val) | freezer4(Exp)

  rule <k> E1:Val + E2:Exp => E2 ~> freezer1(E1) ...</k> [priority(51)]
  rule <k> E1:Exp + E2:Exp => E1 ~> freezer2(E2) ...</k> [priority(52)]
  rule <k> E1:Val && E2:Exp => E2 ~> freezer3(E1) ...</k> [priority(51)]
  rule <k> E1:Exp && E2:Exp => E1 ~> freezer4(E2) ...</k> [priority(52)]

  rule <k> E2:Val ~> freezer1(E1) => E1 + E2 ...</k>
  rule <k> E1:Val ~> freezer2(E2) => E1 + E2 ...</k>
  rule <k> E2:Val ~> freezer3(E1) => E1 && E2 ...</k>
  rule <k> E1:Val ~> freezer4(E2) => E1 && E2 ...</k>
endmodule

This is still rather cumbersome, but it is already greatly simplified. In the
next lesson, we will see how additional features of K can be used to specify
heating and cooling rules much more compactly.

Exercises

Modify LESSON-13-C to add rules to evaluate integer subtraction.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.14: Defining Evaluation Order.

Lesson 1.14: Defining Evaluation Order

The purpose of this lesson is to explain how to use the heat and cool
attributes, context and context alias sentences, and the strict and
seqstrict attributes to more compactly express heating and cooling in K,
and to express more advanced evaluation strategies in K.

The `heat` and `cool` attributes

Thus far, we have been using rule priority and casts to express when to heat
an expression and when to cool it. For example, the rules for heating have
lower priority, so they do not apply if the term could be evaluated instead,
and the rules for heating are expressly written only to apply if the argument
of the expression is a value.

However, K has built-in support for deciding when to heat and when to cool.
This support comes in the form of the rule attributes heat and cool as
well as the specially named function isKResult.

Consider the following definition, which is equivalent to LESSON-13-C
(lesson-14-a.k):

module LESSON-14-A-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-14-A
  imports LESSON-14-A-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax KItem ::= freezer1(Exp) | freezer2(Exp)
                 | freezer3(Exp) | freezer4(Exp)

  rule <k> E:Exp + HOLE:Exp => HOLE ~> freezer1(E) ...</k>
    requires isKResult(E) [heat]
  rule <k> HOLE:Exp + E:Exp => HOLE ~> freezer2(E) ...</k> [heat]
  rule <k> E:Exp && HOLE:Exp => HOLE ~> freezer3(E) ...</k>
    requires isKResult(E) [heat]
  rule <k> HOLE:Exp && E:Exp => HOLE ~> freezer4(E) ...</k> [heat]

  rule <k> HOLE:Exp ~> freezer1(E) => E + HOLE ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer2(E) => HOLE + E ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer3(E) => E && HOLE ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer4(E) => HOLE && E ...</k> [cool]

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

We have introduced three major changes to this definition. First, we have
removed the Val sort. We replace it instead with a function isKResult.
The function in question must have the same signature and attributes as seen in
this example. It ought to return true whenever a term should not be heated
(because it is a value) and false when it should be heated (because it is not
a value). We thus also insert isKResult calls in the side condition of two
of the heating rules, where the Val sort was previously used.

Second, we have removed the rule priorities on the heating rules and the use of
the Val sort on the cooling rules, and replaced them with the heat and
cool attributes. These attributes instruct the compiler that these rules are
heating and cooling rules, and thus should implicitly apply only when certain
terms on the LHS either are or are not a KResult (i.e., isKResult returns
true versus false).

Third, we have renamed some of the variables in the heating and cooling rules
to the special variable HOLE. Syntactically, HOLE is just a special name
for a variable, but it is treated specially by the compiler. By naming a
variable HOLE, we have informed the compiler which term is being heated
or cooled. The compiler will automatically insert the side condition
requires isKResult(HOLE) to cooling rules and the side condition
requires notBool isKResult(HOLE) to heating rules.

Exercise

Modify LESSON-14-A to add rules to evaluate integer subtraction.

Simplifying further with Contexts

The above example is still rather cumbersome to write. We must explicitly write
both the heating and the cooling rule separately, even though they are
essentially inverses of one another. It would be nice to instead simply
indicate which terms should be heated and cooled, and what part of them to
operate on.

To do this, K introduces a new type of sentence, the context. Contexts
begin with the context keyword instead of the rule keyword, and usually
do not contain a rewrite operator.

Consider the following definition which is equivalent to LESSON-14-A
(lesson-14-b.k):

module LESSON-14-B-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-14-B
  imports LESSON-14-B-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  context <k> E:Exp + HOLE:Exp ...</k>
    requires isKResult(E)
  context <k> HOLE:Exp + _:Exp ...</k>
  context <k> E:Exp && HOLE:Exp ...</k>
    requires isKResult(E)
  context <k> HOLE:Exp && _:Exp ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

In this example, the heat and cool rules have been removed entirely, as
have been the productions defining the freezers. Don't worry, they still exist
under the hood; the compiler is just generating them automatically. For each
context sentence like above, the compiler generates a #freezer production,
a heat rule, and a cool rule. The generated form is equivalent to the
rules we wrote manually in LESSON-14-A. However, we are now starting to
considerably simplify the definition. Instead of 3 sentences, we just have one.

`context alias` sentences and the `strict` and `seqstrict` attributes

Notice that the contexts we included in LESSON-14-B still seem rather
similar in form. For each expression we want to evaluate, we are declaring
one context for each operand of that expression, and they are each rather
similar to one another. We would like to be able to simplify further by
simply annotating each expression production with information about how
it is to be evaluated instead. We can do this with the seqstrict attribute.

Consider the following definition, once again equivalent to those above
(lesson-14-c.k):

module LESSON-14-C-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp [seqstrict(exp; 1, 2)]
               > left: Exp "&&" Exp [seqstrict(exp; 1, 2)]
endmodule

module LESSON-14-C
  imports LESSON-14-C-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  context alias [exp]: <k> HERE ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

This definition has two important changes from the one above. The first is
that the individual context sentences have been removed and have been
replaced with a single context alias sentence. You may notice that this
sentence begins with an identifier in square brackets followed by a colon. This
syntax is a way of naming individual sentences in K for reference by the tool
or by other sentences. The context alias sentence also has a special variable
HERE.

The second is that the productions in LESSON-14-C-SYNTAX have been given a
seqstrict attribute. The value of this attribute has two parts. The first
is the name of a context alias sentence. The second is a comma-separated list
of integers. Each integer represents an index of a non-terminal in the
production, counting from 1. For each integer present, the compiler implicitly
generates a new context sentence according to the following rules:

The compiler starts by looking for the context alias sentence named. If
there is more than one, then one context sentence is created per
context alias sentence with that name.
For each context created, the variable HERE in the context alias is
substituted with an instance of the production the seqstrict attribute is
attached to. Each child of that production is a variable. The non-terminal
indicated by the integer offset of the seqstrict attribute is given the name
HOLE.
For each integer offset prior in the list to the one currently being
processed, the predicate isKResult(E) is conjuncted together and included
as a side condition, where E is the child of the production term with that
offset, starting from 1. For example, if the attribute lists 1, 2, then
the rule generated for the 2 will include isKResult(E1) where E1 is the
first child of the production.

As you can see if you work through the process, the above code will ultimately
generate the same contexts present in LESSON-14-B.

Finally, note that there are a few minor syntactic conveniences provided by the
seqstrict attribute. First, in the special case of the context alias sentence
being <k> HERE ...</k>, you can omit both the context alias sentence
and the name from the seqstrict attribute.

Second, if the numbered list of offsets contains every non-terminal in the
production, it can be omitted from the attribute value.

Thus, we can finally produce the idiomatic K definition for this example
(lesson-14-d.k):

module LESSON-14-D-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp [seqstrict]
               > left: Exp "&&" Exp [seqstrict]
endmodule

module LESSON-14-D
  imports LESSON-14-D-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

Exercise

Modify LESSON-14-D to add a production and rule to evaluate integer
subtraction.

Nondeterministic evaluation order with the `strict` attribute

Thus far, we have focused entirely on deterministic evaluation order. However,
not all languages are deterministic in the order they evaluate expressions.
For example, in C, the expression a() + b() + c() is guaranteed to parse
to (a() + b()) + c(), but it is not guaranteed that a will be called before
b before c. In fact, this evaluation order is non-deterministic.

We can express non-deterministic evaluation orders with the strict attribute.
Its behavior is identical to the seqstrict attribute, except that step 3 in
the above list (with the side condition automatically added) does not take
place. In other words, if we wrote syntax Exp ::= Exp "+" Exp [strict]
instead of syntax Exp ::= Exp "+" Exp [seqstrict], it would generate the
following two contexts instead of the ones found in LESSON-14-B:

  context <k> _:Exp + HOLE:Exp ...</k>
  context <k> HOLE:Exp + _:Exp ...</k>

As you can see, these contexts will generate heating rules that can both
apply to the same term. As a result, the choice of which heating rule
applies first is non-deterministic, and as we saw in Lesson 1.13, we can
get all possible behaviors by passing --search to krun.

Exercises

Add integer division to LESSON-14-D. Make division and addition strict
instead of seqstrict, and write a rule evaluating integer division with a
side condition that the denominator is non-zero. Run krun --search on the
program 1 / 0 + 2 / 1 and observe all possible outputs of the program. How
many are there total, and why?
Rework your solution from Lesson 1.9, Exercise 2 to evaluate expressions from left to right using the seqstrict attribute.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.15: Configuration Declarations and Cell Nesting.

Lesson 1.15: Configuration Declarations and Cell Nesting

The purpose of this lesson is to explain how to store additional information
about the state of your interpreter by declaring cells using the
configuration sentence, as well as how to add additional inputs to your
definition.

Cells and Configuration Declarations

We have already covered the absolute basics of cells in K by looking at the
<k> cell. As explained in Lesson 1.13, the
<k> cell is available without being explicitly declared. It turns out this is
because, if the user does not explicitly specify a configuration sentence
anywhere in the main module of their definition, the configuration sentence
from the DEFAULT-CONFIGURATION module of
kast.md is imported
automatically. Here is what that sentence looks like:

  configuration <k> $PGM:K </k>

This configuration declaration declares a single cell, the <k> cell. It also
declares that at the start of rewriting, the contents of that cell should be
initialized with the value of the $PGM configuration variable.
Configuration variables function as inputs to krun. These terms are supplied
to krun in the form of ASTs parsed using a particular module. By default, the
$PGM configuration variable uses the main syntax module of the definition.

The cast on the configuration variable also specifies the sort that is used as
the entry point to the parser, in this case the K sort. It is often
useful to cast to other sorts there as well for better control over the accepted
language. The sort used for the $PGM variable is referred to as the start
symbol. During parsing, the default start symbol K subsumes all user-defined
sorts except for syntactic lists. These are excluded because they will always
produce an ambiguity error when parsing a single element.

Note that we did not explicitly specify the $PGM configuration variable when
we invoked krun on a file. This is because krun handles the $PGM variable
specially, and allows you to pass the term for that variable via a file passed
as a positional argument to krun. We did, however, specify the PGM name
explicitly when we called krun with the -cPGM command line argument in
Lesson 1.2. This is the other, explicit, way of
specifying an input to krun.

This explains the most basic use of configuration declarations in K. We can,
however, declare multiple cells and multiple configuration variables. We can
also specify the initial values of cells statically, rather than dynamically
via krun.

For example, consider the following definition (lesson-15-a.k):

module LESSON-15-A-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= List{Int,","}
endmodule

module LESSON-15-A
  imports LESSON-15-A-SYNTAX
  imports INT

  configuration <k> $PGM:Ints </k>
                <sum> 0 </sum>

  rule <k> I:Int, Is:Ints => Is ...</k>
       <sum> SUM:Int => SUM +Int I </sum>
endmodule

This simple definition takes a list of integers as input and sums them
together. Here we have declared two cells: <k> and <sum>. Unlike <k>,
<sum> does not get initialized via a configuration variable, but instead
is initialized statically with the value 0.

Note the rule in the second module: we have explicitly specified multiple
cells in a single rule. K will expect each of these cells to match in order for
the rule to apply.

Here is a second example (lesson-15-b.k):

module LESSON-15-B-SYNTAX
  imports INT-SYNTAX
endmodule

module LESSON-15-B
  imports LESSON-15-B-SYNTAX
  imports INT
  imports BOOL

  configuration <k> . </k>
                <first> $FIRST:Int </first>
                <second> $SECOND:Int </second>

  rule <k> . => FIRST >Int SECOND </k>
       <first> FIRST </first>
       <second> SECOND </second>
endmodule

This definition takes two integers as command-line arguments and populates the
<k> cell with a Boolean indicating whether the first integer is greater than
the second. Notice that we have specified no $PGM configuration variable
here. As a result, we cannot invoke krun via the syntax krun $file.
Instead, we must explicitly pass values for each configuration variable via the
-cFIRST and -cSECOND command line flags. For example, if we invoke
krun -cFIRST=0 -cSECOND=1, we will get the value false in the K cell.

You can also specify both a $PGM configuration variable and other
configuration variables in a single configuration declaration, in which case
you would be able to initialize $PGM with either a positional argument or the
-cPGM command line flag, but the other configuration variables would need
to be explicitly initialized with -c.

Exercise

Modify your solution to Lesson 1.14, Exercise 2 to add a new cell with a
configuration variable of sort Bool. This variable should determine whether
the / operator is evaluated using /Int or divInt. Test that by specifying
different values for this variable, you can change the behavior of rounding on
division of negative numbers.

Cell Nesting

It is possible to nest cells inside one another. A cell that contains other
cells must contain only other cells, but in doing this, you are able to
create a hierarchical structure to the configuration. Consider the following
definition (lesson-15-c.k), which is equivalent to the one in LESSON-15-B:

module LESSON-15-C-SYNTAX
  imports INT-SYNTAX
endmodule

module LESSON-15-C
  imports LESSON-15-C-SYNTAX
  imports INT
  imports BOOL

  configuration <T>
                  <k> . </k>
                  <state>
                    <first> $FIRST:Int </first>
                    <second> $SECOND:Int </second>
                  </state>
                </T>

  rule <k> . => FIRST >Int SECOND </k>
       <first> FIRST </first>
       <second> SECOND </second>
endmodule

Note that we have added some new cells to the configuration declaration:
the <T> cell wraps the entire configuration, and the <state> cell is
introduced around the <first> and <second> cells.

However, we have not changed the rule in this definition. This is because of
a concept in K called configuration abstraction. K allows you to specify
any number of cells in a rule (except zero) in any order you want, and K will
compile the rules into a form that matches the structure of the configuration
specified by the configuration declaration.

Here then, is how this rule would look after the configuration abstraction
has been resolved:

  rule <T>
         <k> . => FIRST >Int SECOND </k>
         <state>
           <first> FIRST </first>
           <second> SECOND </second>
         </state>
       </T>

In other words, K will complete cells to the top of the configuration by
inserting parent cells where appropriate based on the declared structure of
the configuration. This is useful because as a definition evolves, the
configuration may change, but you don't want to have to modify every single
rule each time. Thus, K follows the principle that you should only mention the
cells in a rule that are actually needed in order to accomplish its specific
goal. By following this best practice, you can significantly increase the
modularity of the definition and make it easier to maintain and modify.

Note that unlike top-level rewrite rules, cells that appear inside function
rules are not necessarily completed to the top of the configuration. They still
participate in cell ccompletion in the sense that you can mention cell
structure loosely inside a function rule and it will be completed into the
correct cell structure specified by the configuration declaration. However,
they do not complete all the way to the top, instead completing only up to
the top-most cell mentioned in the rule.

For example, if I write the following function rule in the above definition:

  rule doStuff(<first> FIRST </first>) => FIRST

The function will only match on the first cell, rather than the entire
configuration. However, if we had mentioned a parent cell in the rule, it still
would have completed the children of that parent cell as needed to ensure that
the resulting term is well formed.

Exercise

Modify your definition from the previous exercise in this lesson to wrap the
two cells you have declared in a top cell <T>. You should not have to change
any other rules in the definition.

Cell Variables

Sometimes it is desirable to explicitly match a variable against certain
fragments of the configuration. Because K's configuration is hierarchical,
we can grab subsets of the configuration as if they were just another term.
However, configuration abstraction applies here as well.
In particular, for each cell you specify in a configuration declaration, a
unique sort is assigned for that cell with a single constructor (the cell
itself). The sort name is taken by removing all special characters,
capitalizing the first letter and each letter after a hyphen, and adding the
word Cell at the end. For example, in the above example, the cell sorts are
TCell, KCell, StateCell, FirstCell, and SecondCell. If we had declared
a cell as <first-number>, then the cell sort name would be FirstNumberCell.

You can explicitly reference a variable of one of these sorts anywhere you
might instead write that cell. For example, consider the following rule:

  rule <k> true => S </k>
       (S:StateCell => <state>... .Bag ...</state>)

Here we have introduced two new concepts. The first is the variable of sort
StateCell, which matches the entire <state> part of the configuration. The
second is that we have introduced the concept of ... once again. When a cell
contains other cells, it is also possible to specify ... on either the left,
right or both sides of the cell term. Each of these three syntaxes are
equivalent in this case. When they appear on the left-hand side of a rule, they
indicate that we don't care what value any cells not explicitly named might
have. For example, we might write <state>... <first> 0 </first> ...</state> on
the left-hand side of a rule in order to indicate that we want to match the
rule when the <first> cell contains a zero, regardless of what the <second>
cell contains. If we had not included this ellipsis, it would have been a
syntax error, because K would have expected you to provide a value for each of
the child cells.

However, if, as in the example above, the ... appeared on the right-hand side
of a rule, this instead indicates that the cells not explicitly mentioned under
the cell should be initialized with their default value from the configuration
declaration. In other words, that rule will set the value of <first> and
<second> to zero.

You may note the presence of the phrase .Bag here. You can think of this as
the empty set of cells. It is used as the child of a cell when you want to
indicate that no cells should be explicitly named. We will cover other uses
of this term in later lessons.

Exercises

Modify the definition from the previous exercise in this lesson so that the
Boolean cell you created is initialized to false. Then add a production
syntax Stmt ::= Bool ";" Exp, and a rule that uses this Stmt to set the
value of the Boolean flag. Then add another production
syntax Stmt ::= "reset" ";" Exp which sets the value of the Boolean flag back
to its default value via a ... on the right-hand side. You will need to add
an additional cell around the Boolean cell to make this work.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.16: Maps, Semantic Lists, and Sets.

Lesson 1.16: Maps, Semantic Lists, and Sets

The purpose of this lesson is to explain how to use the data structure sorts
provided by K: maps, lists, and sets.

Maps

The most frequently used type of data structure in K is the map. The sort
provided by K for this purpose is the Map sort, and it is provided in
domains.md in the MAP
module. This type is not (currently) polymorphic. All Map terms are maps that
map terms of sort KItem to other terms of sort KItem. A KItem can contain
any sort except a K sequence. If you need to store such a term in a
map, you can always use a wrapper such as syntax KItem ::= kseq(K).

A Map pattern consists of zero or more map elements (as represented by the
symbol syntax Map ::= KItem "|->" KItem), mixed in any order, separated by
whitespace, with zero or one variables of sort Map. The empty map is
represented by .Map. If all of the bindings for the variables in the keys
of the map can be deterministically chosen, these patterns can be matched in
O(1) time. If they cannot, then each map element that cannot be
deterministically constructed contributes a single dimension of polynomial
time to the cost of the matching. In other words, a single such element is
linear, two are quadratic, three are cubic, etc.

Patterns like the above are the only type of Map pattern that can appear
on the left-hand-side of a rule. In other words, you are not allowed to write
a Map pattern on the left-hand-side with more than one variable of sort Map
in it. You are, however, allowed to write such patterns on the right-hand-side
of a rule. You can also write a function pattern in the key of a map element
so long as all the variables in the function pattern can be deterministically
chosen.

Note the meaning of matching on a Map pattern: a map pattern with no
variables of sort Map will match if the map being matched has exactly as
many bindings as |-> symbols in the pattern. It will then match if each
binding in the map pattern matches exactly one distinct binding in the map
being matched. A map pattern with one Map variable will also match any map
that contains such a map as a subset. The variable of sort Map will be bound
to whatever bindings are left over (.Map if there are no bindings left over).

Here is an example of a simple definition that implements a very basic
variable declaration semantics using a Map to store the value of variables
(lesson-16-a.k):

module LESSON-16-A-SYNTAX
  imports INT-SYNTAX
  imports ID-SYNTAX

  syntax Exp ::= Id | Int
  syntax Decl ::= "int" Id "=" Exp ";" [strict(2)]
  syntax Pgm ::= List{Decl,""}
endmodule

module LESSON-16-A
  imports LESSON-16-A-SYNTAX
  imports BOOL

  configuration <T>
                  <k> $PGM:Pgm </k>
                  <state> .Map </state>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // variable declaration
  rule <k> int X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>

  // variable lookup
  rule <k> X:Id => I ...</k>
       <state>... X |-> I ...</state>

  syntax Bool ::= isKResult(K) [symbol, function]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

There are several new features in this definition. First, note we import
the module ID-SYNTAX. This module is defined in domains.md and provides a
basic syntax for identifiers. We are using the Id sort provided by this
module in this definition to implement the names of program variables. This
syntax is only imported when parsing programs, not when parsing rules. Later in
this lesson we will see how to reference specific concrete identifiers in a
rule.

Second, we introduce a single new function over the Map sort. This function,
which is represented by the symbol
syntax Map ::= Map "[" KItem "<-" KItem "]", represents the map update
operation. Other functions over the Map sort can be found in domains.md.

Finally, we have used the ... syntax on a cell containing a Map. In this
case, the meaning of <state>... Pattern ...</state>,
<state>... Pattern </state>, and <state> Pattern ...</state> are the same:
it is equivalent to writing <state> (Pattern) _:Map </state>.

Consider the following program (a.decl):

int x = 0;
int y = 1;
int a = x;

If we run this program with krun, we will get the following result:

<T>
  <k>
    .
  </k>
  <state>
    a |-> 0
    x |-> 0
    y |-> 1
  </state>
</T>

Note that krun has automatically sorted the collection for you. This doesn't
happen at runtime, so you still get the performance of a hash map, but it will
help make the output more readable.

Exercise

Create a sort Stmt that is a subsort of Decl. Create a production of sort
Stmt for variable assignment in addition to the variable declaration
production. Feel free to use the syntax syntax Stmt ::= Id "=" Exp ";". Write
a rule that implements variable assignment using a map update function. Then
write the same rule using a map pattern. Test your implementations with some
programs to ensure they behave as expected.

Semantic Lists

In a previous lesson, we explained how to represent lists in the AST of a
program. However, this is not the only context where lists can be used. We also
frequently use lists in the configuration of an interpreter in order to
represent certain types of program state. For this purpose, it is generally
useful to have an associative-list sort, rather than the cons-list sorts
provided in Lesson 1.12.

The type provided by K for this purpose is the List sort, and it is also
provided in domains.md, in the LIST module. This type is also not
(currently) polymorphic. Like Map, all List terms are lists of terms of the
KItem sort.

A List pattern in K consists of zero or more list elements (as represented by
the ListItem symbol), followed by zero or one variables of sort List,
followed by zero or more list elements. An empty list is represented by
.List. These patterns can be matched in O(log(N)) time. This is the only
type of List pattern that can appear on the left-hand-side of a rule. In
other words, you are not allowed to write a List pattern on the
left-hand-side with more than one variable of sort List in it. You are,
however, allowed to write such patterns on the right-hand-side of a rule.

Note the meaning of matching on a List pattern: a list pattern with no
variables of sort List will match if the list being matched has exactly as
many elements as ListItem symbols in the pattern. It will then match if each
element in sequence matches the pattern contained in the ListItem symbol. A
list pattern with one variable of sort List operates the same way, except
that it can match any list with at least as many elements as ListItem
symbols, so long as the prefix and suffix of the list match the patterns inside
the ListItem symbols. The variable of sort List will be bound to whatever
elements are left over (.List if there are no elements left over).

The ... syntax is allowed on cells containing lists as well. In this case,
the meaning of <cell>... Pattern </cell> is the same as
<cell> _:List (Pattern) </cell>, the meaning of <cell> Pattern ...</cell>
is the same as <cell> (Pattern) _:List</cell>. Because list patterns with
multiple variables of sort List are not allowed, it is an error to write
<cell>... Pattern ...</cell>.

Here is an example of a simple definition that implements a very basic
function-call semantics using a List as a function stack (lesson-16-b.k):

module LESSON-16-B-SYNTAX
  imports INT-SYNTAX
  imports ID-SYNTAX

  syntax Exp ::= Id "(" ")" | Int
  syntax Stmt ::= "return" Exp ";" [strict]
  syntax Decl ::= "fun" Id "(" ")" "{" Stmt "}"
  syntax Pgm ::= List{Decl,""}
  syntax Id ::= "main" [token]
endmodule

module LESSON-16-B
  imports LESSON-16-B-SYNTAX
  imports BOOL
  imports LIST

  configuration <T>
                  <k> $PGM:Pgm ~> main () </k>
                  <functions> .Map </functions>
                  <fstack> .List </fstack>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // function definitions
  rule <k> fun X:Id () { S } => . ...</k>
       <functions>... .Map => X |-> S ...</functions>

  // function call
  syntax KItem ::= stackFrame(K)
  rule <k> X:Id () ~> K => S </k>
       <functions>... X |-> S ...</functions>
       <fstack> .List => ListItem(stackFrame(K)) ...</fstack>

  // return statement
  rule <k> return I:Int ; ~> _ => I ~> K </k>
       <fstack> ListItem(stackFrame(K)) => .List ...</fstack>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

Notice that we have declared the production syntax Id ::= "main" [token].
Since we use the ID-SYNTAX module, this declaration is necessary in order to
be able to refer to the main identifier directly in the configuration
declaration. Our <k> cell now contains a K sequence initially: first we
process all the declarations in the program, then we call the main function.

Consider the following program (foo.func):

fun foo() { return 5; }
fun main() { return foo(); }

When we krun this program, we should get the following output:

<T>
  <k>
    5 ~> .
  </k>
  <functions>
    foo |-> return 5 ;
    main |-> return foo ( ) ;
  </functions>
  <fstack>
    .List
  </fstack>
</T>

Note that we have successfully put on the <k> cell the value returned by the
main function.

Exercise

Add a term of sort Id to the stackFrame operator to keep track of the
name of the function in that stack frame. Then write a function
syntax String ::= printStackTrace(List) that takes the contents of the
<fstack> cell and pretty prints the current stack trace. You can concatenate
strings with +String in the STRING module in domains.md, and you can
convert an Id to a String with the Id2String function in the ID module.
Test this function by creating a new expression that returns the current stack
trace as a string. Make sure to update isKResult and the Exp sort as
appropriate to allow strings as values.

Sets

The final primary data structure sort in K is a set, i.e., an idempotent
unordered collection where elements are deduplicated. The sort provided by K
for this purpose is the Set sort and it is provided in domains.md in the
SET module. Like maps and lists, this type is not (currently) polymorphic.
Like Map and List, all Set terms are sets of terms of the KItem sort.

A Set pattern has the exact same restrictions as a Map pattern, except that
its elements are treated like keys, and there are no values. It has the same
performance characteristics as well. However, syntactically it is more similar
to the List sort: An empty Set is represented by .Set, but a set element
is represented by the SetItem symbol.

Matching behaves similarly to the Map sort: a set pattern with no variables
of sort Set will match if the set has exactly as many bindings as SetItem
symbols, and if each element pattern matches one distinct element in the set.
A set with a variable of sort Set also matches any superset of such a set.
As with map, the elements left over will be bound to the Set variable (or
.Set if no elements are left over).

Like Map, the ... syntax on a set is syntactic sugar for an anonymous
variable of sort Set.

Here is an example of a simple modification to LESSON-16-A which uses a Set
to ensure that variables are never declared more than once. In practice, you
would likely just use the in_keys symbol over maps to test for this, but
it's still useful as an example of sets in practice:

module LESSON-16-C-SYNTAX
  imports LESSON-16-A-SYNTAX
endmodule

module LESSON-16-C
  imports LESSON-16-C-SYNTAX
  imports BOOL
  imports SET

  configuration <T>
                  <k> $PGM:Pgm </k>
                  <state> .Map </state>
                  <declared> .Set </declared>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // variable declaration
  rule <k> int X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>
       <declared> D => D SetItem(X) </declared>
    requires notBool X in D

  // variable lookup
  rule <k> X:Id => I ...</k>
       <state>... X |-> I ...</state>
       <declared>... SetItem(X) ...</declared>

  syntax Bool ::= isKResult(K) [symbol, function]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

Now if we krun a program containing duplicate declarations, it will get
stuck on the declaration.

Exercises

Modify your solution to Lesson 1.14, Exercise 2 and introduce the sorts
Decls, Decl, and Stmt which include variable and function declaration
(without function parameters), and return and assignment statements, as well
as call expressions. Use List and Map to implement these operators, making
sure to consider the interactions between components, such as saving and
restoring the environment of variables at each call site. Don't worry about
local function definitions or global variables for now. Make sure to test the
resulting interpreter.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.17: Cell Multiplicity and Cell Collections.

Lesson 1.17: Cell Multiplicity and Cell Collections

The purpose of this lesson is to explain how you can create optional cells
and cells that repeat multiple times in a configuration using a feature called
cell multiplicity.

Cell Multiplicity

K allows you to specify attributes for cell productions as part of the syntax
of configuration declarations. Unlike regular productions, which use the []
syntax for attributes, configuration cells use an XML-like attribute syntax:

configuration <k color="red"> $PGM:K </k>

This configuration declaration gives the <k> cell the color red during
unparsing using the color attribute as discussed in
Lesson 1.9.

However, in addition to the usual attributes for productions, there are some
other attributes that can be applied to cells with special meaning. One such
attribute is the multiplicity attribute. By default, each cell that is
declared occurs exactly once in every configuration term. However, using the
multiplicity attribute, this default behavior can be changed. There are two
values that this attribute can have: ? and *.

Optional cells

The first cell multiplicity we will discuss is ?. Similar to a regular
expression language, this attribute tells the compiler that this cell can
appear 0 or 1 times in the configuration. In other words, it is an
optional cell. By default, K does not create optional cells in the initial
configuration, unless that optional cell has a configuration variable inside
it. However, it is possible to override the default behavior and create that
cell initially by adding the additional cell attribute initial="".

K uses the .Bag symbol to represent the absence of any cells in a particular
rule. Consider the following module:

module LESSON-17-A
  imports INT

  configuration <k> $PGM:K </k>
                <optional multiplicity="?"> 0 </optional>

  syntax KItem ::= "init" | "destroy"

  rule <k> init => . ...</k>
       (.Bag => <optional> 0 </optional>)
  rule <k> destroy => . ...</k>
       (<optional> _ </optional> => .Bag)

endmodule

In this definition, when the init symbol is executed, the <optional> cell
is added to the configuration, and when the destroy symbol is executed, it
is removed. Any rule that matches on that cell will only match if that cell is
present in the configuration.

Exercise

Create a simple definition with a Stmts sort that is a List{Stmt,""} and
a Stmt sort with the constructors
syntax Stmt ::= "enable" | "increment" | "decrement" | "disable". The
configuration should have an optional cell that contains an integer that
is created with the enable command, destroyed with the disable command,
and its value is incremented or decremented by the increment and decrement
command.

Cell collections

The second type of cell multiplicity we will discuss is *. Simlar to a
regular expression language, this attribute tells the compiler that this cell
can appear 0 or more times in the configuration. In other words, it is a
cell collection. Cells with multiplicity * must be the only child of
their parent cell. As a convention, the inner cell is usually named with the
singular form of what it contains, and the outer cell with the plural form, for
example, "thread" and "threads".

All cell collections are required to have the type attribute set to either
Set or Map. A Set cell collection is represented as a set and behaves
internally the same as the Set sort, although it actually declares a new
sort. A Map cell collection is represented as a Map in which the first
subcell of the cell collection is the key and the remaining cells are the
value.

For example, consider the following module:

module LESSON-17-B
  imports INT
  imports BOOL
  imports ID-SYNTAX

  syntax Stmt ::= Id "=" Exp ";" [strict(2)]
                | "return" Exp ";" [strict]
  syntax Stmts ::= List{Stmt,""}
  syntax Exp ::= Id
               | Int
               | Exp "+" Exp [seqstrict]
               | "spawn" "{" Stmts "}"
               | "join" Exp ";" [strict]

  configuration <threads>
                  <thread multiplicity="*" type="Map">
                    <id> 0 </id>
                    <k> $PGM:K </k>
                  </thread>
                </threads>
                <state> .Map </state>
                <next-id> 1 </next-id>

  rule <k> X:Id => I:Int ...</k>
       <state>... X |-> I ...</state>
  rule <k> X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>
  rule <k> S:Stmt Ss:Stmts => S ~> Ss ...</k>
  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>

  rule <thread>...
         <k> spawn { Ss } => NEXTID ...</k>
       ...</thread>
       <next-id> NEXTID => NEXTID +Int 1 </next-id>
       (.Bag =>
       <thread>
         <id> NEXTID </id>
         <k> Ss </k>
       </thread>)

  rule <thread>...
         <k> join ID:Int ; => I ...</k>
       ...</thread>
       (<thread>
         <id> ID </id>
         <k> return I:Int ; ...</k>
       </thread> => .Bag)

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

This module implements a very basic fork/join semantics. The spawn expression
spawns a new thread to execute a sequence of statements and returns a thread
id, and the join statement waits until a thread executes return and then
returns the return value of the thread.

Note something quite novel here: the <k> cell is inside a cell of
multiplicity *. Since the <k> cell is just a regular cell (mostly), this
is perfectly allowable. Rules that don't mention a specific thread are
automatically completed to match any thread.

When you execute programs in this language, the cells in the cell collection
get sorted and printed like any other collection, but they still display like
cells. Rules in this language also benefit from all the structural power of
cells, allowing you to omit cells you don't care about or complete the
configuration automatically. This allows you to have the power of cells while
still being a collection under the hood.

Exercises

Modify the solution from Lesson 1.16, Exercise 1 so that the cell you use to
keep track of functions in a Map is now a cell collection. Run some programs
and compare how they get unparsed before and after this change.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.18: Term Equality and the Ternary Operator.

Lesson 1.18: Term Equality and the Ternary Operator

The purpose of this lesson is to introduce how to compare equality of terms in
K, and how to put conditional expressions directly into the right-hand side of
rules.

Term Equality

One major way you can compare whether two terms are equal in K is to simply
match both terms with a variable with the same name. This will only succeed
in matching if the two terms are equal structurally. However, sometimes this
is impractical, and it is useful to have access to a way to actually compare
whether two terms in K are equal. The operator for this is found in
domains.md in the K-EQUAL
module. The operator is ==K and takes two terms of sort K and returns a
Bool. It returns true if they are equal. This includes equality over builtin
types such as Map and Set where equality is not purely structural in
nature. However, it does not include any notion of semantic equality over
user-defined syntax. The inverse symbol for inequality is =/=K.

Ternary Operator

One way to introduce conditional logic in K is to have two separate rules,
each with a side condition (or one rule with a side condition and another with
the owise attribute). However, sometimes it is useful to explicitly write
a conditional expression directly in the right-hand side of a rule. For this
purpose, K defines one more operator in the K-EQUAL module, which corresponds
to the usual ternary operator found in many languages. Here is an example of its
usage (lesson-18.k):

module LESSON-18
  imports INT
  imports BOOL
  imports K-EQUAL

  syntax Exp ::= Int | Bool | "if" "(" Exp ")" Exp "else" Exp [strict(1)]

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true

  rule if (B:Bool) E1:Exp else E2:Exp => #if B #then E1 #else E2 #fi
endmodule

Note the symbol on the right-hand side of the final rule. This symbol is
polymorphic: B must be of sort Bool, but E1 and E2 could have been
any sort so long as both were of the same sort, and the sort of the entire
expression becomes equal to that sort. K supports polymorphic built-in
operators, but does not yet allow users to write their own polymorphic
productions.

The behavior of this function is to evaluate the Boolean expression to a
Boolean, then pick one of the two children and return it based on whether the
Boolean is true or false. Please note that it is not a good idea to use this
symbol in cases where one or both of the children is potentially undefined
(for example, an integer expression that divides by zero). While the default
implementation is smart enough to only evaluate the branch that happens to be
picked, this will not be true when we begin to do program verification. If
you need short circuiting behavior, it is better to use a side condition.

Exercises

Write a function in K that takes two terms of sort K and returns an
Int: the Int should be 0 if the terms are equal and 1 if the terms are
unequal.
Modify your solution to Lesson 1.16, Exercise 1 and introduce an if
Stmt to the syntax of the language, then implement it using the #if symbol.
Make sure to write tests for the resulting interpreter.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.19: Debugging with GDB.

Lesson 1.19: Debugging with GDB or LLDB

The purpose of this lesson is to teach how to debug your K interpreter using
the K-language support provided in GDB or
LLDB.

Caveats

This lesson has been written with GDB support on Linux in mind. Unfortunately,
on macOS, GDB has limited support. To address this, we have introduced early
experimental support for debugging with LLDB on macOS. In some cases, the
features supported by LLDB are slightly different to those supported by GDB; the
tutorial text will make this clear where necessary. If you use a macOS with an
LLVM version older than 15, you may need to upgrade it to use the LLDB
correctly. If you encounter an issue on either operating system, please open an
issue against the K repository.

Getting started

On Linux, you will need GDB in order to complete this lesson. If you do not
already have GDB installed, then do so. Steps to install GDB are outlined in
this GDB Tutorial.

On macOS, LLDB should already have been installed with K's build dependencies
(whether you have built K from source, or installed it using kup or Homebrew).

The first thing neccessary in order to debug a K interpreter is to build the
interpreter with full debugging support enabled. This can be done relatively
simply. First, run kompile with the command line flag --enable-llvm-debug.
The resulting compiled K definition will be ready to support debugging.

Once you have a compiled K definition and a program you wish to debug, you can
start the debugger by passing the --debugger flag to krun. This will
automatically load the program you are executing into GDB and drop you into a
GDB shell ready to start executing the program.

As an example, consider the following K definition (lesson-19-a.k):

module LESSON-19-A
  imports INT

  rule I => I +Int 1
    requires I <Int 100
endmodule

If we compile this definition with kompile lesson-19-a.k --enable-llvm-debug,
and run the program 0 in the debugger with krun -cPGM=0 --debugger, we will
see the following output (roughly, and depending on which platform you are
using):

GDB / Linux

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./lesson-19-a-kompiled/interpreter...
warning: File "/home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter
line to your configuration file "/home/dwightguth/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/dwightguth/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb)

To make full advantage of the GDB features of K, you should follow the first
command listed in this output message and add the corresponding
add-auto-load-safe-path command to your ~/.gdbinit file as prompted.
Please note that the path will be different on your machine than the one
listed above. Adding directories to the "load safe path" effectively tells GDB
to trust those directories. All content under a given directory will be recursively
trusted, so if you want to avoid having to add paths to the "load safe path" every
time you kompile a different K definition, then you can just trust a minimal
directory containing all your kompiled files; however, do not choose a top-level directory containing arbitrary files as this amounts to trusting arbitrary files and is a security risk. More info on the load safe path
can be found here.

LLDB / macOS

(lldb) target create "./lesson-19-a-kompiled/interpreter"
warning: 'interpreter' contains a debug script. To run this script in this debug session:

    command script import "/Users/brucecollie/code/scratch/lesson-19-a-kompiled/interpreter.dSYM/Contents/Resources/Python/interpreter.py"

To run all discovered debug scripts in this session:

    settings set target.load-script-from-symbol-file true

Current executable set to '/Users/brucecollie/code/scratch/lesson-19-a-kompiled/interpreter' (x86_64).
(lldb) settings set -- target.run-args  ".krun-2023-03-20-11-22-46-TcYt9ffhb2/tmp.in.RupiLwHNfn" "-1" ".krun-2023-03-20-11-22-46-TcYt9ffhb2/result.kore"
(lldb)

LLDB applies slightly different security policies to GDB. To load K's debugging
scripts for this session only, you can run the command script import line at
the LLDB prompt. The loaded scripts will not persist across debugging sessions
if you do this. It is also possible to configure LLDB to automatically load the
K scripts when an interpreter is started in LLDB; doing so requires a slightly
less broad permission than GDB.

On macOS, the .dSYM directory that contains debugging symbols for an
executable can also contain Python scripts in Contents/Resources/Python. If
there is a Python script with a name matching the name of the current executable
(here, interpreter and interpreter.py), it will be automatically loaded if
the target.load-script-from-symbol-file setting is set). You can therefore add
the settings set command to your ~/.lldbinit without enabling full arbitrary
code execution, but you should be aware of the paths from which code can be
executed if you do so.

Basic commands

LLDB Note: the k start and k step commands are currently not
implemented in the K LLDB scripts. To work around this limitation temporarily,
you can run process launch --stop-at-entry instead of k start. To emulate
k step, first run rbreak k_step once, then continue instead of each k step. We hope to address these limitations soon.

The most basic commands you can execute in the K GDB session are to run your
program or to step through it. The first can be accomplished using GDB's
built-in run command. This will automatically start the program and begin
executing it. It will continue until the program aborts or finishes, or the
debugger is interrupted with Ctrl-C.

Sometimes you want finer-grained control over how you proceed through the
program you are debugging. To step through the rule applications in your
program, you can use the k start and k step GDB commands.

k start is similar to the built-in start command in that it starts the
program and then immediately breaks before doing any work. However, unlike
the start command which will break immediately after the main method of
a program is executed, the K start program will initialize the rewriter,
evaluate the initial configuration, and break immediately prior to applying
any rewrite steps.

In the example above, here is what we see when we run the k start command:

Temporary breakpoint 1 at 0x239210
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter .krun-2021-08-13-14-10-50-sMwBkbRicw/tmp.in.01aQt85TaA -1 .krun-2021-08-13-14-10-50-sMwBkbRicw/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, 0x0000000000239210 in main ()
0x0000000000231890 in step (subject=<k>
  0 ~> .
</k>)
(gdb)

As you can see, we are stopped at the step function in the interpreter.
This function is responsible for taking top-level rewrite steps. The subject
parameter to this function is the current K configuration.

We can step through K rewrite steps one at a time by running the k step
command. By default, this takes a single rewrite step (including any function
rule applications that are part of that step).

Here is what we see when we run that command:

Continuing.

Temporary breakpoint -22, 0x0000000000231890 in step (subject=<k>
  1 ~> .
</k>)
(gdb)

As we can see, we have taken a single rewrite step. We can also pass a number
to the k step command which indicates the number of rewrite steps to take.

Here is what we see if we run k step 10:

Continuing.

Temporary breakpoint -23, 0x0000000000231890 in step (subject=<k>
  11 ~> .
</k>)
(gdb)

As we can see, ten rewrite steps were taken.

Breakpoints

The next important step in debugging an application in GDB is to be able to
set breakpoints. Generally speaking, there are three types of breakpoints we
are interested in a K semantics: Setting a breakpoint when a particular
function is called, setting a breakpoint when a particular rule is applied,
and setting a breakpoint when a side condition of a rule is evaluated.

The easiest way to do the first two things is to set a breakpoint on the
line of code containing the function or rule.

For example, consider the following K definition (lesson-19-b.k):

module LESSON-19-B
  imports BOOL

  syntax Bool ::= isBlue(Fruit) [function]
  syntax Fruit ::= Blueberry() | Banana()
  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false

  rule F:Fruit => isBlue(F)
endmodule

Once this program has been compiled for debugging, we can run the program
Blueberry(). We can then set a breakpoint that stops when the isBlue
function is called with the following command in GDB:

break lesson-19-b.k:4

Similarly, in LLDB, run:

breakpoint set --file lesson-19-b.k --line 4

Here is what we see if we set this breakpoint and then run the interpreter:

(gdb) break lesson-19-b.k:4
Breakpoint 1 at 0x231040: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 4.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-20-27-vXOQmV6lwS/tmp.in.fga98yqXlc -1 .krun-2021-08-13-14-20-27-vXOQmV6lwS/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit (_1=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:4
4         syntax Bool ::= isBlue(Fruit) [function]
(gdb)

(lldb) breakpoint set --file lesson-19-b.k --line 4
Breakpoint 1: where = interpreter`LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit + 20 at lesson-19-b.k:4:19, address = 0x0000000100003ff4
(lldb) run
Process 50546 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50546 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003ff4 interpreter`LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit(_1=Blueberry ( )) at lesson-19-b.k:4:19
   1   	module LESSON-19-B
   2   	  imports BOOL
   3   	
-> 4   	  syntax Bool ::= isBlue(Fruit) [function]
   5   	  syntax Fruit ::= Blueberry() | Banana()
   6   	  rule isBlue(Blueberry()) => true
   7   	  rule isBlue(Banana()) => false
(lldb)

As we can see, we have stopped at the point where we are evaluating that
function. The value _1 that is a parameter to that function shows the
value passed to the function by the caller.

We can also break when the isBlue(Blueberry()) => true rule applies by simply
changing the line number to the line number of that rule:

(gdb) break lesson-19-b.k:6
Breakpoint 1 at 0x2af710: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-32-36-7kD0ic7XwD/tmp.in.8JNH5Qtmow -1 .krun-2021-08-13-14-32-36-7kD0ic7XwD/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, apply_rule_138 () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:6
6         rule isBlue(Blueberry()) => true
(gdb)

(lldb) breakpoint set --file lesson-19-b.k --line 6
Breakpoint 1: where = interpreter`apply_rule_140 at lesson-19-b.k:6:8, address = 0x0000000100004620
(lldb) run
Process 50681 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50681 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100004620 interpreter`apply_rule_140 at lesson-19-b.k:6:8
   3   	
   4   	  syntax Bool ::= isBlue(Fruit) [function]
   5   	  syntax Fruit ::= Blueberry() | Banana()
-> 6   	  rule isBlue(Blueberry()) => true
   7   	  rule isBlue(Banana()) => false
   8   	
   9   	  rule F:Fruit => isBlue(F)
(lldb)

We can also do the same with a top-level rule:

(gdb) break lesson-19-b.k:9
Breakpoint 1 at 0x2aefa0: lesson-19-b.k:9. (2 locations)
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-33-13-9fC8Sz4aO3/tmp.in.jih1vtxSiQ -1 .krun-2021-08-13-14-33-13-9fC8Sz4aO3/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, apply_rule_107 (Var'Unds'DotVar0=<generatedCounter>
  0
</generatedCounter>, Var'Unds'DotVar1=., VarF=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:9
9         rule F:Fruit => isBlue(F)
(gdb)

(lldb) breakpoint set --file lesson-19-b.k --line 9
Breakpoint 1: 2 locations.
(lldb) run
Process 50798 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50798 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f2e interpreter`apply_rule_109(Var'Unds'DotVar0=<generatedCounter>
  0
</generatedCounter>, Var'Unds'DotVar1=., VarF=Blueberry ( )) at lesson-19-b.k:9:8
   6   	  rule isBlue(Blueberry()) => true
   7   	  rule isBlue(Banana()) => false
   8   	
-> 9   	  rule F:Fruit => isBlue(F)
   10  	endmodule
(lldb)

Unlike the function rule above, we see several parameters to this function.
These are the substitution that was matched for the function. Variables only
appear in this substitution if they are actually used on the right-hand side
of the rule.

Advanced breakpoints

Sometimes it is inconvenient to set the breakpoint based on a line number.

It is also possible to set a breakpoint based on the rule label of a particular
rule. Consider the following definition (lesson-19-c.k):

module LESSON-19-C
  imports INT
  imports BOOL

  syntax Bool ::= isEven(Int) [function]
  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0

endmodule

We will run the program isEven(4). We can set a breakpoint for when a rule
applies by means of the MODULE-NAME.label.rhs syntax:

(gdb) break LESSON-19-C.isEven.rhs
Breakpoint 1 at 0x2afda0: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-40-29-LNNT8YEZ61/tmp.in.ZG93vWCGGC -1 .krun-2021-08-13-14-40-29-LNNT8YEZ61/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LESSON-19-C.isEven.rhs () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6         rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb)

(lldb) breakpoint set --name LESSON-19-C.isEven.rhs
Breakpoint 1: where = interpreter`LESSON-19-C.isEven.rhs at lesson-19-c.k:6:18, address = 0x00000001000038e0
(lldb) run
Process 51205 launched: '/Users/brucecollie/code/scratch/lesson-19-c-kompiled/interpreter' (x86_64)
Process 51205 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000038e0 interpreter`LESSON-19-C.isEven.rhs at lesson-19-c.k:6:18
   3   	  imports BOOL
   4   	
   5   	  syntax Bool ::= isEven(Int) [function]
-> 6   	  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
   7   	  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
   8   	
   9   	endmodule
(lldb)

We can also set a breakpoint for when a rule's side condition is evaluated
by means of the MODULE-NAME.label.sc syntax:

(gdb) break LESSON-19-C.isEven.sc
Breakpoint 1 at 0x2afd70: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-41-48-1BoGfJRbYc/tmp.in.kg4F8cwfCe -1 .krun-2021-08-13-14-41-48-1BoGfJRbYc/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6         rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb) finish
Run till exit from #0  LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
0x00000000002b2662 in LblisEven'LParUndsRParUnds'LESSON-19-C'Unds'Bool'Unds'Int (_1=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:5
5         syntax Bool ::= isEven(Int) [function]
Value returned is $1 = true
(gdb)

(lldb) breakpoint set --name LESSON-19-C.isEven.sc
Breakpoint 1: where = interpreter`LESSON-19-C.isEven.sc + 1 at lesson-19-c.k:6:18, address = 0x00000001000038c1
(lldb) run
Process 52530 launched: '/Users/brucecollie/code/scratch/lesson-19-c-kompiled/interpreter' (x86_64)
Process 52530 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000038c1 interpreter`LESSON-19-C.isEven.sc(VarI=0x0000000101800088) at lesson-19-c.k:6:18
   3   	  imports BOOL
   4   	
   5   	  syntax Bool ::= isEven(Int) [function]
-> 6   	  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
   7   	  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
   8   	
   9   	endmodule
(lldb) finish
Process 52649 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step out
Return value: (bool) $0 = true

    frame #0: 0x00000001000069e5 interpreter`LblisEven'LParUndsRParUnds'LESSON-19-C'Unds'Bool'Unds'Int(_1=0x0000000101800088) at lesson-19-c.k:5:19
   2   	  imports INT
   3   	  imports BOOL
   4   	
-> 5   	  syntax Bool ::= isEven(Int) [function]
   6   	  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
   7   	  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
   8
(lldb)

Here we have used the built-in command finish to tell us whether the side
condition returned true or not. Note that once again, we see the substitution
that was matched from the left-hand side. Like before, a variable will only
appear here if it is used in the side condition.

Debugging rule matching

Sometimes it is useful to try to determine why a particular rule did or did
not apply. K provides some basic debugging commands which make it easier
to determine this.

Consider the following K definition (lesson-19-d.k):

module LESSON-19-D

  syntax Foo ::= foo(Bar)
  syntax Bar ::= bar(Baz) | bar2(Baz)
  syntax Baz ::= baz() | baz2()

  rule [baz]: foo(bar(baz())) => .K

endmodule

Suppose we try to run the program foo(bar(baz2())). It is obvious from this
example why the rule in this definition will not apply. However, in practice,
such cases are not always obvious. You might look at a rule and not immediately
spot why it didn't apply on a particular term. For this reason, it can be
useful to get the debugger to provide a log about how it tried to match that
term. You can do this with the k match command. If you are stopped after
having run k start or k step, you can obtain this log for any rule after
any step by running the command k match MODULE.label subject for a particular
top-level rule label.

For example, with the baz rule above, we get the following output:

(gdb) k match LESSON-19-D.baz subject
Subject:
baz2 ( )
does not match pattern:
baz ( )

(lldb) k match LESSON-19-D.baz subject
Subject:
baz2 ( )
does not match pattern:
baz ( )

As we can see, it provided the exact subterm which did not match against the
rule, as well as the particular subpattern it ought to have matched against.

This command does not actually take any rewrite steps. In the event that
matching actually succeeds, you will still need to run the k step command
to advance to the next step.

Final notes

In addition to the functionality provided above, you have the full power of
GDB or LLDB at your disposal when debugging. Some features are not particularly
well-adapted to K code and may require more advanced knowledge of the
term representation or implementation to use effectively, but anything that
can be done in GDB or LLDB can in theory be done using this debugging functionality.
We suggest you refer to the
GDB Documentation or
LLDB Tutorial if you
want to try to do something and are unsure as to how.

Exercises

Compile your solution to Lesson 1.18, Exercise 2 with debugging support
enabled and step through several programs you have previously used to test.
Then set a breakpoint on the isKResult function and observe the state of the
interpreter when stopped at that breakpoint. Set a breakpoint on the rule for
addition and run a program that causes it to be stopped at that breakpoint.
Finally, step through the program until the addition symbol is at the top
of the K cell, and then use the k match command to report the reason why
the subtraction rule does not apply. You may need to modify the definition
to insert some rule labels.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.20: K Backends and the Haskell Backend.

Lesson 1.20: K Backends and the Haskell Backend

The purpose of this lesson is to teach about the multiple backends of K,
in particular the Haskell Backend which is the complement of the backend we
have been using so far.

K Backends

Thus far, we have not discussed the distinction between the K frontend and
the K backends at all. We have simply assumed that if you run kompile on a
K definition, there will be a compiler backend that will allow you to execute
the K definition you have compiled.

K actually has multiple different backends. The one we have been using so far
implicitly, the default backend, is called the LLVM Backend. It is
designed to support efficient, optimized concrete execution and search. It
does this by compiling your K definition to LLVM bitcode and then using LLVM
to generate machine code for it that is compiled and linked and executed.
However, K is a formal methods toolkit at the end of the day, and the primary
goal many people have when defining a programming language in K is to
ultimately be able to perform more advanced verification on programs in their
programming language.

It is for this purpose that K also provides the Haskell Backend, so called
because it is implemented in Haskell. While we will cover the features of the
Haskell Backend in more detail in the next two lessons, the important thing to
understand is that it is a separate backend which is optimized for more formal
reasoning about programming languages. While it is capable of performing
concrete execution, it does not do so as efficiently as the LLVM Backend.
In exchange, it provides more advanced features.

Choosing a backend

You can choose which backend to use to compile a K definition by means of the
--backend flag to kompile. By default, if you do not specify this flag, it
is equivalent to if you had specified --backend llvm. However, to use the
Haskell Backend instead, you can simply say kompile --backend haskell on a
particular K definition.

As an example, here is a simple K definition that we have seen before in the
previous lesson (lesson-20.k):

module LESSON-20
  imports INT

  rule I => I +Int 1
    requires I <Int 100
endmodule

Previously we compiled this definition using the LLVM Backend, but if we
instead execute the command kompile lesson-20.k --backend haskell, we
will get an interpreter for this K definition that is implemented in Haskell
instead. Unlike the default LLVM Backend, the Haskell Backend is not a
compiler per se. It does not generate new Haskell code corresponding to your
programming language and then compile and execute it. Instead, it is an
interpreter which reads the generated IR from kompile and implements in
Haskell an interpreter that is capable of interpreting any K definition.

Note that on arm64 macOS (Apple Silicon), there is a known issue with the Compact
library that causes crashes in the Haskell backend. Pass the additional flag
--no-haskell-binary to kompile to resolve this.
This flag is also needed when using krun.

Exercise

Try running the program 0 in this K definition on the Haskell Backend and
compare the final configuration to what you would get compiling the same
definition with the LLVM Backend.

Legacy backends

As a quick note, K does provide one other backend, which exists primarily as
legacy code which should be considered deprecated. This is the
Java Backend. The Java Backend is essentially a precursor to the Haskell
Backend. We will not cover this backend in any detail since it is deprecated,
but we still mention it here for the purposes of understanding.

Exercises

Compile your solution to Lesson 1.18, Exercise 2 with the Haskell Backend
and execute some programs. Compare the resulting configurations with the
output of the same program on the LLVM Backend. Note that if you are getting
different behaviors on the Haskell backend, you might have some luck debugging
by passing --search to krun when using the LLVM backend.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.21: Unification and Symbolic Execution.

Lesson 1.21: Unification and Symbolic Execution

The purpose of this lesson is to teach the basic concepts of symbolic execution
in order to introduce the unique capabilities of the Haskell Backend at a
conceptual level.

Symbolic Execution

Thus far, all of the programs we have run using K have been concrete
configurations. What this means is that the configuration we use to initialize
the K rewrite engine is concrete; in other words, contains no logical
variables. The LLVM Backend is a concrete execution engine, meaning that
it is only capable of rewriting concrete configurations.

By contrast, the Haskell Backend performs symbolic execution, which is
capable of rewriting any configuration, including those where parts of the
configuration are symbolic, ie, contain variables or uninterpreted
functions.

Unification

Previously, we have introduced the concept that K rewrite rules operate by
means of pattern matching: the current configuration being rewritten is pattern
matched against the left-hand side of the rewrite rule, and the substitution
is used in order to construct a new term from the right-hand side. In symbolic
execution, we use
unification
instead of pattern matching. To summarize, unification behaves akin to a
two-way pattern matching where both the configuration and the left-hand side
of the rule can contain variables, and the algorithm generates a
most general unifier containing substitutions for the variables in both
which will make both terms equal.

Feasibility

Unification by itself cannot completely solve the problem of symbolic
execution. One task symbolic execution must perform is to identify whether
a particular symbolic term is feasible, that is to say, that there actually
exists a concrete instantiation of that term such that all the logical
constraints on that term can actually be satisfied. The Haskell Backend
delegates this task to Z3, an
SMT solver.
This solver is used to periodically trim configurations that are determined
to be mathematically infeasible.

Symbolic terms

The final component of symbolic execution consists of the task of introducing
symbolic terms into the configuration. This can be done one of two different
ways. First, the term being passed to krun can actually be symbolic. This
is less frequently used because it requires the user to construct an AST
that contains variables, something which our current parsing capabilities are
not well-equipped to do. The second, more common, way of introducing symbolic
terms into a configuration consists of writing rules where there exists an
existentially qualified variable on the right-hand side of the rule that does
not exist on the left-hand side of the rule.

In order to prevent users from writing such rules by accident, K requires
that such variables begin with the ? prefix. For example, here is a rule
that rewrites a constructor foo to a symbolic integer:

rule <k> foo => ?X:Int ...</k>

When this rule applies, a fresh variable is introduced to the configuration, which
then is unified against the rules that might apply in order to symbolically
execute that configuration.

`ensures` clauses

We also introduce here a new feature of K rules that applies when a rule
has this type of variable on the right-hand side: the ensures clause.
An ensures clause is similar to a requires clause and can appear after
a rule body, or after a requires clause. The ensures clause is used to
introduce constraints that might apply to the variable that was introduced by
that rule. For example, we could write the rule above with the additional
constraint that the symbolic integer that was introduced must be less than
five, by means of the following rule:

rule <k> foo => ?X:Int ...</k> ensures ?X <Int 5

Putting it all together

Putting all these pieces together, it is possible to use the Haskell Backend
to perform symbolic reasoning about a particular K module, determining all the
possible states that can be reached by a symbolic configuration.

For example, consider the following K definition (lesson-21.k):

module LESSON-21
    imports INT

    rule <k> 0 => ?X:Int ... </k> ensures ?X =/=Int 0
    rule <k> X:Int => 5  ... </k> requires X >=Int 10
endmodule

When we symbolically execute the program 0, we get the following output
from the Haskell Backend:

    <k>
      5 ~> .
    </k>
  #And
    {
      true
    #Equals
      ?X:Int >=Int 10
    }
  #And
    #Not ( {
      ?X:Int
    #Equals
      0
    } )
#Or
    <k>
      ?X:Int ~> .
    </k>
  #And
    #Not ( {
      true
    #Equals
      ?X:Int >=Int 10
    } )
  #And
    #Not ( {
      ?X:Int
    #Equals
      0
    } )

Note some new symbols introduced by this configuration: #And, #Or, and
#Equals. While andBool, orBool, and ==K represent functions of sort
Bool, #And, #Or, and #Equals are matching logic connectives. We
will discuss matching logic in more detail later in the tutorial, but the basic
idea is that these symbols represent Boolean operators over the domain of
configurations and constraints, as opposed to over the Bool sort.

Notice that the configuration listed above is a disjunction of conjunctions.
This is the most common form of output that can be produced by the Haskell
Backend. In this case, each conjunction consists of a configuration and a set
of constraints. What this conjunction describes, essentially, is a
configuration and a set of information that was derived to be true while
rewriting that configuration.

Similar to how we saw --search in a previous lesson, the reason we have
multiple disjuncts is because there are multiple possible output states
for this program, depending on whether or not the second rule applied. In the
first case, we see that ?X is greater than or equal to 10, so the second rule
applied, rewriting the symbolic integer to the concrete integer 5. In the
second case, we see that the second rule did not apply because ?X is less
than 10. Moreover, because of the ensures clause on the first rule, we know
that ?X is not zero, therefore the first rule will not apply a second time.
If we had omitted this constraint, we would have ended up infinitely applying
the first rule, leading to krun not terminating.

In the next lesson, we will cover how symbolic execution forms the backbone
of deductive program verification in K and how we can use K to prove programs
correct against a specification.

Exercises

Create another rule in LESSON-21 that rewrites odd integers greater than
ten to a symbolic even integer less than 10 and greater than 0. This rule will
now apply nondeterministically along with the existing rules. Predict what the
resulting output configuration will be from rewriting 0 after adding this
rule. Then run the program and see whether your prediction is correct.

Once you have completed the above exercises, you can continue to
Lesson 1.22: Basics of Deductive Program Verification using K.

Lesson 1.22: Basics of Deductive Program Verification using K

In this lesson, you will familiarize yourself with the basics of using K for
deductive program verification.

1. Setup: Simple Programming Language with Function Calls

We base this lesson on a simple programming language with functions,
assignment, if conditionals, and while loops. Take your time to study its
formalization below (lesson-22.k):

module LESSON-22-SYNTAX
    imports INT-SYNTAX
    imports BOOL-SYNTAX
    imports ID-SYNTAX

    syntax Exp ::= IExp | BExp

    syntax IExp ::= Id | Int

    syntax KResult ::= Int | Bool | Ints

    // Take this sort structure:
    //
    //     IExp
    //    /    \
    // Int      Id
    //
    // Through the List{_, ","} functor.
    // Must add a `Bot`, for a common subsort for the empty list.

    syntax Bot
    syntax Bots ::= List{Bot, ","} [klabel(exps)]
    syntax Ints ::= List{Int, ","} [klabel(exps)]
                  | Bots
    syntax Ids  ::= List{Id, ","}  [klabel(exps)]
                  | Bots
    syntax Exps ::= List{Exp, ","} [klabel(exps), seqstrict]
                  | Ids | Ints

    syntax IExp ::= "(" IExp ")" [bracket]
                  | IExp "+" IExp [seqstrict]
                  | IExp "-" IExp [seqstrict]
                  > IExp "*" IExp [seqstrict]
                  | IExp "/" IExp [seqstrict]
                  > IExp "^" IExp [seqstrict]
                  | Id "(" Exps ")" [strict(2)]

    syntax BExp ::= Bool

    syntax BExp ::= "(" BExp ")" [bracket]
                  | IExp "<=" IExp [seqstrict]
                  | IExp "<"  IExp [seqstrict]
                  | IExp ">=" IExp [seqstrict]
                  | IExp ">"  IExp [seqstrict]
                  | IExp "==" IExp [seqstrict]
                  | IExp "!=" IExp [seqstrict]

    syntax BExp ::= BExp "&&" BExp
                  | BExp "||" BExp

    syntax Stmt ::=
         Id "=" IExp ";" [strict(2)]                        // Assignment
       | Stmt Stmt [left]                                   // Sequence
       | Block                                              // Block
       | "if" "(" BExp ")" Block "else" Block [strict(1)]   // If conditional
       | "while" "(" BExp ")" Block                         // While loop
       | "return" IExp ";"                    [seqstrict]   // Return statement
       | "def" Id "(" Ids ")" Block                         // Function definition

    syntax Block ::=
         "{" Stmt "}"    // Block with statement
       | "{" "}"         // Empty block
endmodule

module LESSON-22
    imports INT
    imports BOOL
    imports LIST
    imports MAP
    imports LESSON-22-SYNTAX

    configuration
      <k> $PGM:Stmt </k>
      <store> .Map </store>
      <funcs> .Map </funcs>
      <stack> .List </stack>

 // -----------------------------------------------
    rule <k> I1 + I2 => I1 +Int I2 ... </k>
    rule <k> I1 - I2 => I1 -Int I2 ... </k>
    rule <k> I1 * I2 => I1 *Int I2 ... </k>
    rule <k> I1 / I2 => I1 /Int I2 ... </k>
    rule <k> I1 ^ I2 => I1 ^Int I2 ... </k>

    rule <k> I:Id => STORE[I] ... </k>
         <store> STORE </store>

 // ------------------------------------------------
    rule <k> I1 <= I2 => I1  <=Int I2 ... </k>
    rule <k> I1  < I2 => I1   <Int I2 ... </k>
    rule <k> I1 >= I2 => I1  >=Int I2 ... </k>
    rule <k> I1  > I2 => I1   >Int I2 ... </k>
    rule <k> I1 == I2 => I1  ==Int I2 ... </k>
    rule <k> I1 != I2 => I1 =/=Int I2 ... </k>

    rule <k> B1 && B2 => B1 andBool B2 ... </k>
    rule <k> B1 || B2 => B1  orBool B2 ... </k>

    rule <k> S1:Stmt S2:Stmt => S1 ~> S2 ... </k>

    rule <k> ID = I:Int ; => . ... </k>
         <store> STORE => STORE [ ID <- I ] </store>

    rule <k> { S } => S ... </k>
    rule <k> {   } => . ... </k>

    rule <k> if (true)   THEN else _ELSE => THEN ... </k>
    rule <k> if (false) _THEN else  ELSE => ELSE ... </k>

    rule <k> while ( BE ) BODY => if ( BE ) { BODY while ( BE ) BODY } else { } ... </k>

    rule <k> def FNAME ( ARGS ) BODY => . ... </k>
         <funcs> FS => FS [ FNAME <- def FNAME ( ARGS ) BODY ] </funcs>

    rule <k> FNAME ( IS:Ints ) ~> CONT => #makeBindings(ARGS, IS) ~> BODY </k>
         <funcs> ... FNAME |-> def FNAME ( ARGS ) BODY ... </funcs>
         <store> STORE => .Map </store>
         <stack> .List => ListItem(state(CONT, STORE)) ... </stack>

    rule <k> return I:Int ; ~> _ => I ~> CONT </k>
         <stack> ListItem(state(CONT, STORE)) => .List ... </stack>
         <store> _ => STORE </store>

    rule <k> return I:Int ; ~> . => I </k>
         <stack> .List </stack>

    syntax KItem ::= #makeBindings(Ids, Ints)
                   | state(continuation: K, store: Map)
 // ----------------------------------------------------
    rule <k> #makeBindings(.Ids, .Ints) => . ... </k>
    rule <k> #makeBindings((I:Id, IDS => IDS), (IN:Int, INTS => INTS)) ... </k>
         <store> STORE => STORE [ I <- IN ] </store>
endmodule

Next, compile this example using kompile lesson-22.k --backend haskell. If
your processor is an Apple Silicon processor, add the --no-haskell-binary
flag if the compilation fails.

2. Setup: Proof Environment

Next, take the following snippet of K code and save it in lesson-22-spec.k.
This is a skeleton of the proof environment, and we will complete it as the
lesson progresses.

requires "lesson-22.k"
requires "domains.md"

module LESSON-22-SPEC-SYNTAX
    imports LESSON-22-SYNTAX

endmodule

module VERIFICATION
    imports K-EQUAL
    imports LESSON-22-SPEC-SYNTAX
    imports LESSON-22
    imports MAP-SYMBOLIC

endmodule

module LESSON-22-SPEC
    imports VERIFICATION

endmodule

3. Claims

The first claim we will ask K to prove is that 3 + 4, in fact, equals 7.
Claims are stated using the claim keyword, followed by the claim
statement:

claim <k> 3 + 4 => 7 ... </k>

Add this claim to the LESSON-22-SPEC module and run the K prover using the
command kprove lesson-22-spec.k. You should get back the output #Top,
which denotes the Matching Logic equivalent of true and means, in this
context, that all claims have been proven correctly.

The second claim reasons about the if statement that has a concrete condition:

claim <k> if ( 3 + 4 == 7 ) {
            $a = 1 ;
            } else {
            $a = 2 ;
            }
        => . ... </k>
        <store> STORE => STORE [ $a <- 1 ] </store>

stating that the given program terminates (=> .), and when it does, the value
of the variable $a is set to 1, meaning that the execution will have taken
the then branch. Add this claim to the LESSON-22-SPEC module, but also add

syntax Id ::= "$a" [token]

to the LESSON-22-SPEC-SYNTAX module in order to declare $a as a token so
that it can be used as a program variable. Re-run the K prover, which should
again return #Top.

Our third claim demonstrates how to reason about both branches of an if
statement at the same time:

claim <k> $a = A:Int ; $b = B:Int ;
          if ($a < $b) {
            $c = $b ;
          } else {
            $c = $a ;
          }
        => . ... </k>
        <store> STORE => STORE [ $a <- A ] [ $b <- B ] [ $c <- ?C:Int ] </store>
    ensures (?C ==Int A) orBool (?C ==Int B)

The program in question first assigns symbolic integers A and B to program
variables $a and $b, respectively, and then executes the given if
statement, which has a symbolic condition (A < B), updating the value of the
program variable $c in both branches. The specification we give states that
the if statement terminates, with $a and $b updated, respectively, to A
and B, and $c updated to some symbolic integer value ?C. Via the
ensures clause, which is used to specify additional constraints that hold
after execution, we also state that this existentially quantified ?C equals
either A or B.

Add the productions declaring $b and $c as tokens to the
LESSON-22-SPEC-SYNTAX module, the claim to the LESSON-22-SPEC module, run
the K prover again, and observe the output, which should not be #Top this
time. This means that K was not able to prove the claim, and we now need to
understand why. We do so by examining the output, which should look as follows:

    (InfoReachability) while checking the implication:
    The configuration's term unifies with the destination's term,
    but the implication check between the conditions has failed.

  #Not (
    #Exists ?C . {
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- ?C:Int ]
      #Equals
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
    }
  #And
    {
      true
    #Equals
      ?C ==Int A orBool ?C ==Int B
    }
  )
#And
  <generatedTop>
    <k>
      _DotVar1
    </k>
    <store>
      STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
    </store>
    <funcs>
      _Gen3
    </funcs>
    <stack>
      _Gen5
    </stack>
  </generatedTop>
#And
  {
    true
  #Equals
    A <Int B
  }

This output starts with a message telling us at which point the proof failed,
followed by the final state, which consists of three parts: some negative
Matching Logic (ML) constraints, the final configuration (<generatedTop> ... </generatedTop>), and some positive ML constraints. Generally speaking,
these positive and the negative constraints could arise from various sources,
such as (but not limited to) branches taken by the execution
(e.g. { true #Equals A <Int B } or #Not ( { true #Equals A <Int B } )),
or ensures constraints.

First, we examine the message:

(InfoReachability) while checking the implication:
The configuration's term unifies with the destination's term,
but the implication check between the conditions has failed.

which tells us that the structure of the final configuration is as expected,
but that some of the associated constraints cannot be proven. We next look at
the final configuration, in which the relevant item is the <store> ... </store> cell, because it is the only one that we are reasoning about. By
inspecting its contents:

STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]

we see that we should be within the constraints of the ensures, since the
value of $c in the store equals B in this branch. We next examine the
negative and positive constraints of the output and, more often than not, the
goal is to instruct K how to use the information from the final configuration
and the positive constraints to falsify one of the negative constraints. This
is done through simplifications.

So, the positive constraint that we have is

{ true #Equals A <Int B }

meaning that A <Int B holds. Given the analysed program, this tells us that
we are in the then branch of the if. The negative constraint is

  #Not (
    #Exists ?C . {
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- ?C:Int ]
      #Equals
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
    }
  #And
    { true #Equals ?C ==Int A orBool ?C ==Int B }
  )

and we observe, from the first equality, that the existential ?C should be
instantiated with B. This would make both branches of the #And true,
falsifying the outside #Not. We just need to show K how to conclude that
?C ==Int B. We do so by introducing the following simplification into the
VERIFICATION module:

rule { M:Map [ K <- V ] #Equals M [ K <- V' ] } => { V #Equals V' } [simplification]

which formalizes our internal understanding of ?C ==Int B. The rule states
that when we update the same key in the same map with two values, and the
resulting maps are equal, then the two values must be equal as well. The
[simplification] attribute indicates to K to use this rule to simplify the
state when trying to prove claims. Like function rules, simplification rules
do not complete to the top of the configuration, but instead apply anywhere
their left-hand-side matches. Re-run the K prover, which should now return
#Top, indicating that K was able to use the simplification and prove the
required claims.

Next, we show how to state and prove properties of while loops. In
particular, we consider the following loop

claim
    <k>
        while ( 0 < $n ) {
            $s = $s + $n;
            $n = $n - 1;
            } => . ...
    </k>
    <store>
        $s |-> (S:Int => S +Int ((N +Int 1) *Int N /Int 2))
        $n |-> (N:Int => 0)
    </store>
    requires N >=Int 0

which adds the sum of the first $n integers to $s, assuming the value of $n
is non-negative to begin with. This is reflected in the store by stating that,
after the execution of the loop, the original value of $s (which is set to
equal some symbolic integer S) is incremented by ((N +Int 1) *Int N /Int 2), and the value of $n always equals 0. Add $n and $s as tokens in
the LESSON-22-SPEC-SYNTAX module, the above claim to the LESSON-22-SPEC
module, and run the K prover, which should return #Top.

Finally, our last claim is about a program that uses function calls:

claim
    <k>
        def $sum($n, .Ids) {
            $s = 0 ;
            while (0 < $n) {
                $s = $s + $n;
                $n = $n - 1;
            }
            return $s;
        }

        $s = $sum(N:Int, .Ints);
    => . ... </k>
    <funcs> .Map => ?_ </funcs>
    <store> $s |-> (_ => ((N +Int 1) *Int N /Int 2)) </store>
    <stack> .List </stack>
    requires N >=Int 0

Essentially, we have wrapped the while loop from claim 3.4 into a function
$sum, and then called that function with a symbolic integer N, storing the
return value in the variable $s. The specification states that this program
ends up storing the sum of the first N integers in the variable $n. Add $sum
to the LESSON-22-SPEC-SYNTAX module, the above claim to the
LESSON-22-SPEC module, and run the K prover, which should again return
#Top.

Exercises

Change the condition of the if statement in part 3.2 to take the else
branch and adjust the claim so that the proof passes.
The post-condition of the specification in part 3.3 loses some information.
In particular, the value of ?C is in fact the maximum of A and B.
Prove the same claim as in 3.2, but with the post-condition ensures (?C ==Int maxInt(A, B)). For this, you will need to extend the VERIFICATION
module with two simplifications that capture the meaning of maxInt(A:Int, B:Int). Keep in mind that any rewriting rule can be used as a
simplification; in particular, that simplifications can have requires
clauses.
Following the pattern shown in part 3.4, assuming a non-negative initial
value of $b, specify and verify the following while loop:

while ( 0 < $b ) {
    $a = $a + $c;
    $b = $b - 1;
    $c = $c - 1;
}

Hint: You will not need additional simplifications---once you've got the
specification right, the proof will go through.

Write an arbitrary yet not-too-complex function (or several functions
interacting with each other), and try to specify and verify it (them) in K.

Section 2: Intermediate K Concepts

The goal of this second section is to supplement a beginning developer's
knowledge of K after they have gained a basic understanding of K. Each lesson
in this section can be completed independently in order to learn about a
particular facet of the K language. The lessons are written to provide basic
understanding of less commonly-used features of K to someone who is still
learning K. For more complete references of these features, the reader ought to
consult the User Manual.

The reader ought to be able to complete lessons in this section as needed in
order to learn about specific features of interest, but if desired, can also
complete the entire section in one go. Someone who has completed this entire
section ought to be able to read and understand most K specifications, as well
as write their own specifications of some complexity, and use them to perform
most common K-related tasks. They can then read about specific lessons in
Section 3: Advanced K Concepts if they want to
learn more.

Macros, Aliases, and Anywhere Rules
Fresh Constants
KLabels and Abstract Syntax
Overloaded Symbols
Matching Logic Connectives and #Or Patterns
Function Context
Record Productions and Named Nonterminals
#fun and #let
#as patterns
The Matching Operators, :=K and :/=K
Uncommon Evaluation Order Concepts
IEEE 754 Floating Point and Fixed Width Integers
Alpha-renaming-aware Substitution
File I/O
String Buffers and Byte Sequences
The Intermediate Language of K, KORE
Debugging Proofs using the Haskell Backend REPL

Lesson 2.1: Macros, Aliases, and Anywhere Rules

The purpose of this lesson is to explain the behavior of the macro,
macro-rec, alias, and alias-rec production attributes, as well as the
anywhere rule attribute. These attributes control the meaning of how rules
associated with them are applied.

Macros

Thus far in the K tutorial, we have described three different types of rules:

Top-level rewrite rules, which rewrite a configuration composed of cells to
another configuration;
Function rules, which define the behavior of a function written over
arbitrary input and output types; and
Simplification rules, which describe ways in which the symbolic execution
engine ought to simplify terms containing symbolic values.

This lesson introduces three more types of rules, the first of which are
macros. A production is a macro if it has the macro attribute, and all
rules whose top symbol on the left hand side is a macro are macro rules
which define the behavior of the macro. Like function rules and simplification
rules, macro rules do not participate in cell completion. However, unlike
function rules and simplification rules, macro rules are applied statically
before rewriting begins, and the macro symbol is expected to no longer appear
in the initial configuration for rewriting once all macros in that
configuration are rewritten.

The rationale behind macros is they allow you to define one piece of syntax
in terms of another piece of syntax without any runtime overhead associated
with the cost of rewriting one to the other. This process is a common one in
programming language design and specification and is referred to as
desugaring; The syntax that is transformed is typically also referred to as
syntactic sugar for another type of syntax. For example, in a language with
if statements and curly braces, you could write the following fragment
(lesson-01.k):

module LESSON-01
  imports BOOL

  syntax Stmt ::= "if" "(" Exp ")" Stmt             [macro]
                | "if" "(" Exp ")" Stmt "else" Stmt
                | "{" Stmts "}"
  syntax Stmts ::= List{Stmt,""}
  syntax Exp ::= Bool

  rule if ( E ) S => if ( E ) S else { .Stmts }
endmodule

In this example, we see that an if statement without an else clause is
defined in terms of one with an else clause. As a result, we would only
need to give a single rule for how to rewrite if statements, rather than
two separate rules for two types of if statements. This is a common pattern
for dealing with program syntax that contains an optional component to it.

It is worth noting that by default, macros are not applied recursively. To be
more precise, by default a macro that arises as a result of the expansion of
the same macro is not rewritten further. This is primarily to simplify the
macro expansion process and reduce the risk that improperly defined macros will
lead to non-terminating behavior.

It is possible, however, to tell K to expand a macro recursively. To do this,
simply replace the macro attribute with the macro-rec attribute. Note that
K does not do any kind of checking to ensure termination here, so it is
important that rules be defined correctly to always terminate, otherwise the
macro expansion phase will run forever. Fortunately, in practice it is very
simple to ensure this property for most of the types of macros that are
typically used in real-world semantics.

Exercise

Using a Nat sort containing the constructors 0 and S (i.e., a
Peano-style axiomatization of the
natural numbers where S(N) = N + 1, S(S(N)) = N + 2, etc), write a macro
that will compute the sum of two numbers.

Aliases

NOTE: This lesson introduces the concept of "aliases", which are a variant
of macros. While similar, this is different from the concept of "aliases" in
matching logic, which is introduced in Lesson 2.16.

Macros can be very useful in helping you define a programming language.
However, they can be disruptive while pretty printing a configuration. For
example, you might write a set of macros that transforms the code the user
wrote into equivalent code that is slightly harder to read. This can make it
more difficult to understand the code when it is pretty printed as part of the
output of rewriting.

K defines a relatively straightforward but novel solution to this problem,
which is known as a K alias. An alias in K is very similar to a macro,
with the exception that the rewrite rule will also be applied backwards
during the pretty-printing process.

It is very simple to make a production be an alias instead of a macro: simply
use the alias or alias-rec attributes instead of the macro or macro-rec
attributes. For example, if the example involving if statements above was
declared using an alias instead of a macro, the Stmt term if (E) {} else {}
would be pretty-printed as if (E) {}. This is because during pretty-printing,
the term participates in another macro-expansion pass. However, this macro
expansion step will only apply rules with the alias or alias-rec attribute,
and, critically, it will reverse the rule by treating the left-hand side as if
it were the right-hand side, and vice versa.

This can be very useful to allow you to define one construct in terms of
another while still being able to pretty-print the result as if it were
the original term in question. This can be especially useful for applications
of K where we are taking the output of rewriting and attempting to use it as
a code fragment that we then execute, such as with test generation.

Exercise

Modify LESSON-01 above to use an alias instead of a macro and experiment
with how various terms are pretty-printed by invoking krun on them.

`anywhere` rules

The last type of rule introduced in this lesson is the anywhere rule. An
anywhere rule is specified by adding the anywhere attribute to a rule. Such a
rule is similar to a function rule in that it does not participate in cell
completion, and will apply anywhere that the left-hand-side matches in the
configuration, but distinct in that the symbol in question can still be matched
against in the left-hand side of other rules, even during concrete rewriting.
The reasoning behind this is that instead of the symbol in question being a
constructor, it is a constructor modulo the axioms defined with the
anywhere attribute. Essentially, the rules with the anywhere attribute will
apply as soon as they appear in the right-hand side of a rule being applied,
but the symbol in question will still be treated as a symbol that can be
matched on if it is not completely removed by those rules.

This can be useful in certain cases to allow you to define transformations over
particular pieces of syntax while still generally giving those pieces of syntax
another meaning when the anywhere rule does not apply. For example, the ISO C
standard defines the semantics of *&x as exactly equal to x, with no
reading or writing of memory taking place, and the K semantics of C implements
this functionality using an anywhere rule that is applied at compilation time.

NOTE: the anywhere attribute is only implemented on the LLVM backend
currently. Attempting to use it in a semantics that is compiled with the
Haskell backend will result in an error being reported by the compiler. This
should be remembered when using this attribute, as it may not be suitable for
a segment of a semantics which is intended to be symbolically executed.