K

Table of Contents

K is a rewrite-based
executable semantic framework in which programming languages, type
systems and formal analysis tools can be defined using configurations
and rules. Configurations organize the state in units called cells,
which are labeled and can be nested. K rewrite rules make it explicit
which parts of the term are read-only, write-only, read-write, or
unused. This makes K suitable for defining truly concurrent languages
even in the presence of sharing. Computations are represented as
syntactic extensions of the original language abstract syntax, using a
nested list structure which sequentializes computational tasks, such
as program fragments. Computations are like any other terms in a
rewriting environment: they can be matched, moved from one place to
another, modified, or deleted. This makes K suitable for defining
control-intensive features such as abrupt termination, exceptions, or
call/cc.

K Tool Download

  • Install from the latest K GitHub Release.
  • Install pyk, K's scripting interface for Python. Check the API documentation for a complete reference of supported features.
  • Try our Editor Support page for links to K syntax highlighting definitions for various popular editors/IDEs. Please feel free to contribute.
  • Build or browse the code on GitHub, where you can also report bugs.

Learn K

Support

Resources

K Tutorial

The purpose of this series of lessons is to teach developers how to program in
K. While the primary use of K is in the specification of operational semantics
of programming languages, this tutorial is agnostic on how the knowledge of K
is used. For a more detailed tutorial explaining the basic principles of
programming language design, refer to the
K PL Tutorial. Note that that tutorial is somewhat
out of date presently.

This K tutorial is a work in progress. Many lessons are currently simply
placeholders for future content.

To start the K tutorial, begin with
Section 1: Basic Programming in K.

Section 1: Basic K Concepts

The goal of this first section of the K tutorial is to teach the basic
principles of K to someone with no prior experience with K as a programming
language. However, this is not written with the intended audience of someone
who is a complete beginner to programming. We are assuming that the reader
has a firm grounding in computer science broadly, as well as that they have
experience writing code in functional programming languages before.

By the end of this section, the reader ought to be able to write specifications
of simple languages in K, use these specifications to generate a fast
interpreter for their programming language, as well as write basic deductive
program verification proofs over programs in their language. This should give
them the theoretical grounding they need to begin expanding their knowledge
of K in Section 2: Intermediate K Concepts.

To begin this section, refer to
Lesson 1.1: Setting up a K Environment.

Lesson 1.1: Setting up a K Environment

The first step to learning K is to install K on your system, and configure your
editor for K development.

Installing K

You have two options for how to install K, depending on how you intend to
interact with the K codebase. If you are solely a user of K, and have no
interest in developing or making changes to K, you most likely will want to
install one of our binary releases of K. However, if you are going to be a K
developer, or simply want to build K from source, you should follow the
instructions for a source build of K.

Installing K from a binary release

K is developed as a rolling release, with each change to K that passes our
CI infrastructure being deployed on GitHub for download. The latest release of
K can be downloaded here.
This page also contains information on how to install K. It is recommended
that you fully uninstall the old version of K prior to installing the new one,
as K does not maintain entries in package manager databases, with the exception
of Homebrew on MacOS.

Installing K from source

You can clone K from GitHub with the following Git command:

git clone https://github.com/runtimeverification/k --recursive

Instructions on how to build K from source can be found
here.

Configuring your editor

K maintains a set of scripts for a variety of text editors, including vim and
emacs, in various states of maintenance. You can download these scripts with
the following Git command:

git clone https://github.com/kframework/k-editor-support

Because K allows users to define their own grammars for parsing K itself,
not all features of K can be effectively highlighted. However, at the cost of
occasionally highlighting things incorrectly, you can get some pretty good
results in many cases. With that being said, some of the editor scripts in the
above repository are pretty out of date. If you manage to improve them, we
welcome pull requests into the repository.

Troubleshooting

If you have problems installing K, we encourage you to reach out to us. If you
follow the above install instructions and run into a problem, you can
Create a bug report on GitHub

Next lesson

Once you have set up K on your system to your satisfaction, you can continue to
Lesson 1.2: Basics of Functional K.

Lesson 1.2: Basics of Functional K

The purpose of this lesson is to explain the basics of productions and
rules in K. These are two types of K sentences. A K file consists of
one or more requires or modules in K. Each module consists of one or
more imports or sentences. For more information on requires, modules, and
sentences, refer to Lesson 1.5. However, for the time
being, just think of a module as a container for sentences, and don't worry
about requires or imports just yet.

Our first K program

To start with, input the following program into your editor as file
lesson-02-a.k:

module LESSON-02-A

  syntax Color ::= Yellow() | Blue()
  syntax Fruit ::= Banana() | Blueberry()
  syntax Color ::= colorOf(Fruit) [function]

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

Save this file and then run:

kompile lesson-02-a.k

kompile is K's compiler. By default, it takes a program or specification
written in K and compiles it into an interpreter for that input. Right now we
are compiling a single file. A set of K files that are compiled together are
called a K definition. We will cover multiple file K definitions later on.
kompile will output a directory containing everything needed to execute
programs and perform proofs using that definition. In this case, kompile will
(by default) create the directory lesson-02-a-kompiled under the current
directory.

Now, save the following input file in your editor as banana.color in the same
directory as lesson-02-a.k:

colorOf(Banana())

We can now evaluate this K term by running (from the same directory):

krun banana.color

krun will use the interpreter generated by the first call to kompile to
execute this program.

You will get the following output:

<k>
  Yellow ( ) ~> .
</k>

For now, don't worry about the <k>, </k>, or ~> . portions of this
output file.

You can also execute small programs directly by specifying them on the command
line instead of putting them in a file. For example, the same program above
could also have been executed by running the following command:

krun -cPGM='colorOf(Banana())'

Now, let's look at what this definition and program did.

Productions, Constructors, and Functions

The first thing to realize is that this K definition contains 5 productions.
Productions are introduced with the syntax keyword, followed by a sort,
followed by the operator ::= followed by the definition of one or more
productions themselves, separated by the | operator. There are different
types of productions, but for now we only care about constructors and
functions. Each declaration separated by the | operator is individually
a single production, and the | symbol simply groups together productions that
have the same sort. For example, we could equally have written an identical K
definition lesson-02-b.k like so:

module LESSON-02-B

  syntax Color ::= Yellow()
  syntax Color ::= Blue()
  syntax Fruit ::= Banana()
  syntax Fruit ::= Blueberry()
  syntax Color ::= colorOf(Fruit) [function]

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

You can try compiling and running lesson-02-b.k to see that it produces the same output as lesson-02-a.k:

kompile lesson-02-b.k
krun -cPGM='colorOf(Banana())' --definition 'lesson-02-b-kompiled'

where the --definition attribute points to the directory containing a compiled version of LESSON-02-B.
Even the following definition is equivalent:

module LESSON-02-C

  syntax Color ::= Yellow()
                 | Blue()
                 | colorOf(Fruit) [function]
  syntax Fruit ::= Banana()
                 | Blueberry()

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

Each of these types of productions named above has the same underlying syntax,
but context and attributes are used to distinguish between the different
types. Tokens, brackets, lists, macros, aliases, and anywhere productions will
be covered in a later lesson, but this lesson does introduce us to constructors
and functions. Yellow(), Blue(), Banana(), and Blueberry() are
constructors. You can think of a constructor like a constructor for an
algebraic data type, if you're familiar with a functional language. The data
type itself is the sort that appears on the left of the ::= operator. Sorts
in K consist of uppercase identifiers.

Constructors can have arguments, but these ones do not. We will cover the
syntax of productions in detail in the next lesson, but for now, you can write
a production with no arguments as an uppercase or lowercase identifier followed
by the () operator.

A function is distinguished from a constructor by the presence of the
function attribute. Attributes appear in a comma separated list between
square brackets after any sentence, including both productions and rules.
Various attributes with built-in meanings exist in K and will be discussed
throughout the tutorial.

Exercise

Use krun to compute the return value of the colorOf function on a
Blueberry().

Rules, Matching, and Variables

Functions in K are given definitions using rules. A rule begins with the rule
keyword and contains at least one rewrite operator. The rewrite operator
is represented by the syntax =>. The rewrite operator is one of the built-in
productions in K, and we will discuss in more detail how it can be used in
future lessons, but for now, you can think of a rule as consisting of a
left-hand side and a right-hand side, separated by the rewrite
operator. On the left-hand side is the name of the function and zero or more
patterns corresponding to the parameters of the function. On the right-hand
side is another pattern. The meaning of the rule is relatively simple, having
defined these components. If the function is called with arguments that
match the patterns on the left-hand side, then the return value of the
function is the pattern on the right-hand side.

For example, in the above example, if the argument of the colorOf function
is Banana(), then the return value of the function is Yellow().

So far we have introduced that a constructor is a type of pattern in K. We
will introduce more complex patterns in later lessons, but there is one other
type of basic pattern: the variable. A variable, syntactically, consists
of an uppercase identifier. However, unlike a constructor, a variable will
match any pattern with one exception: Two variables with the same name
must match the same pattern.

Here is a more complex example (lesson-02-d.k):

module LESSON-02-D

  syntax Container ::= Jar(Fruit)
  syntax Fruit ::= Apple() | Pear()

  syntax Fruit ::= contentsOfJar(Container) [function]

  rule contentsOfJar(Jar(F)) => F

endmodule

Here we see that Jar is a constructor with a single argument. You can write a
production with multiple arguments by putting the sorts of the arguments in a
comma-separated list inside the parentheses.

In this example, F is a variable. It will match either Apple() or Pear().
The return value of the function is created by substituting the matched
values of all of the variables into the variables on the right-hand side of
the rule.

To demonstrate, compile this definition and execute the following program with
krun:

contentsOfJar(Jar(Apple()))

You will see when you run it that the program returns Apple(), because that
is the pattern that was matched by F.

Exercises

  1. Extend the definition in lesson-02-a.k with the addition of blackberries
    and kiwis. For simplicity, blackberries are black and kiwis are green. Then
    compile your definition and test that your additional fruits are correctly
    handled by the colorOf function.
  2. Create a new definition which defines an outfit as a multi-argument
    constructor consisting of a hat, shirt, pants, and shoes. Define a new sort,
    Boolean, with two constructors, true and false. Each of hat, shirt, pants,
    and shoes will have a single argument (a color), either black or
    white. Then define an outfitMatching function that will return true if all
    the pieces of the outfit are the same color. You do not need to define the
    case that returns false. Write some tests that your function behaves the way
    you expect.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.3: BNF Syntax and Parser Generation.

Lesson 1.3: BNF Syntax and Parser Generation

The purpose of this lesson is to explain the full syntax and semantics of
productions in K as well as how productions and other syntactic
sentences can be used to define grammars for use parsing both rules as well
as programs.

K's approach to parsing

K's grammar is divided into two components: the outer syntax of K and the
inner syntax of K. Outer syntax refers to the parsing of requires,
modules, imports, and sentences in a K definition. Inner syntax
refers to the parsing of rules and programs. Unlike the outer syntax of
K, which is predetermined, much of the inner syntax of K is defined by you, the
developer. When rules or programs are parsed, they are parsed within the
context of a module. Rules are parsed in the context of the module in which
they exist, whereas programs are parsed in the context of the
main syntax module of a K definition. The productions and other syntactic
sentences in a module are used to construct the grammar of the module, which
is then used to perform parsing.

Basic BNF productions

To illustrate how this works, we will consider a simple K definition which
defines a relatively basic calculator capable of evaluating Boolean expressions
containing and, or, not, and xor.

Input the following program into your editor as file lesson-03-a.k:

module LESSON-03-A

  syntax Boolean ::= "true" | "false"
                   | "!" Boolean [function]
                   | Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

You will notice that the productions in this file look a little different than
the ones from the previous lesson. In point of fact, K has two different
mechanisms for defining productions. We have previously been focused
exclusively on the first mechanism, where the ::= symbol is followed by an
alphanumeric identifier followed by a comma-separated list of sorts in
parentheses. However, this is merely a special case of a more generic mechanism
for defining the syntax of productions using a variant of
BNF Form.

For example, in the previous lesson, we had the following set of productions:

module LESSON-03-B
  syntax Color ::= Yellow() | Blue()
  syntax Fruit ::= Banana() | Blueberry()
  syntax Color ::= colorOf(Fruit) [function]
endmodule

It turns out that this is equivalent to the following definition which defines
the same grammar, but using BNF notation:

module LESSON-03-C
  syntax Color ::= "Yellow" "(" ")" | "Blue" "(" ")"
  syntax Fruit ::= "Banana" "(" ")" | "Blueberrry" "(" ")"
  syntax Color ::= "colorOf" "(" Fruit ")" [function]
endmodule

In this example, the sorts of the argument to the function are unchanged, but
everything else has been wrapped in double quotation marks. This is because
in BNF notation, we distinguish between two types of production items:
terminals and non-terminals. A terminal represents simply a literal
string of characters that is verbatim part of the syntax of that production.
A non-terminal, conversely, represents a sort name, where the syntax of that
production accepts any valid term of that sort at that position.

This is why, when we wrote the program colorOf(Banana()), krun was able to
execute that program: because it represented a term of sort Color that was
parsed and interpreted by K's interpreter. In other words, krun parses and
interprets terms according to the grammar defined by the developer. It is
automatically converted into an AST of that term, and then the colorOf
function is evaluated using the function rules provided in the definition.

You can ask yourself: How does K match the strings between the double quotes?
The answer is that K uses Flex to generate a scanner for the grammar. Flex looks
for the longest possible match of a regular expression in the input. If there
are ambiguities between 2 or more regular expressions, it will pick the one with
the highest prec attribute. You can learn more about how Flex matching works
here.

Bringing us back to the file lesson-03-a.k, we can see that this grammar
has given a simple BNF grammar for expressions over Booleans. We have defined
constructors corresponding to the Boolean values true and false, and functions
corresponding to the Boolean operators for and, or, not, and xor. We have also
given a syntax for each of these functions based on their syntax in the C
programming language. As such, we can now write programs in the simple language
we have defined.

Input the following program into your editor as and.bool in the same
directory:

true && false

We cannot interpret this program yet, because we have not given rules defining
the meaning of the && function yet, but we can parse it. To do this, you can
run (from the same directory):

kast --output kore and.bool

kast is K's just-in-time parser. It will generate a grammar from your K
definition on the fly and use it to parse the program passed on the command
line. The --output flag controls how the resulting AST is represented; don't
worry about the possible values yet, just use kore.

You ought to get the following AST printed on standard output, minus the
formatting:

inj{SortBoolean{}, SortKItem{}}(
  Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
    Lbltrue'Unds'LESSON-03-A'Unds'Boolean{}(),
    Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
  )
)

Don't worry about what exactly this means yet, just understand that it
represents the AST of the program that you just parsed. You ought to be able
to recognize the basic shape of it by seeing the words true, false, and
And in there. This is Kore, the intermediate representation of K, and we
will cover it in detail later.

Note that you can also tell kast to print the AST in other formats. For a
more direct representation of the original K, while still maintaining the
structure of an AST, you can say kast --output kast and.bool. This will
yield the following output:

`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(
  `true_LESSON-03-A_Boolean`(.KList),
  `false_LESSON-03-A_Boolean`(.KList)
)

Note how the first output is largely a name-mangled version of the second
output. The one difference is the presence of the inj symbol in the KORE
output. We will talk more about this in later lessons.

Exercise

Parse the expression false || true with --output kast. See if you can
predict approximately what the corresponding output would be with
--output kore, then run the command yourself and compare it to your
prediction.

Ambiguities

Now let's try a slightly more advanced example. Input the following program
into your editor as and-or.bool:

true && false || false

When you try and parse this program, you ought to see the following error:

[Error] Inner Parser: Parsing ambiguity.
1: syntax Boolean ::= Boolean "||" Boolean [function]

`_||__LESSON-03-A_Boolean_Boolean_Boolean`(`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)),`false_LESSON-03-A_Boolean`(.KList))
2: syntax Boolean ::= Boolean "&&" Boolean [function]

`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`_||__LESSON-03-A_Boolean_Boolean_Boolean`(`false_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)))
        Source(./and-or.bool)
        Location(1,1,1,23)

This error is saying that kast was unable to parse this program because it is
ambiguous. K's just-in-time parser is a GLL parser, which means it can handle
the full generality of context-free grammars, including those grammars which
are ambiguous. An ambiguous grammar is one where the same string can be parsed
as multiple distinct ASTs. In this example, it can't decide whether it should
be parsed as (true && false) || false or as true && (false || false). As a
result, it reports the error to the user.

Brackets

Currently there is no way of resolving this ambiguity, making it impossible
to write complex expressions in this language. This is obviously a problem.
The standard solution in most programming languages to this problem is to
use parentheses to indicate the appropriate grouping. K generalizes this notion
into a type of production called a bracket. A bracket production in K
is any production with the bracket attribute. It is required that such a
production only have a single non-terminal, and the sort of the production
must equal the sort of that non-terminal. However, K does not otherwise
impose restrictions on the grammar the user provides for a bracket. With that
being said, the most common type of bracket is one in which a non-terminal
is surrounded by terminals representing some type of bracket such as
(), [], {}, <>, etc. For example, we can define the most common
type of bracket, the type used by the vast majority of programming languages,
quite simply.

Consider the following modified definition, which we will save to
lesson-03-d.k:

module LESSON-03-D

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   | "!" Boolean [function]
                   | Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

In this definition, if the user does not explicitly define parentheses, the
grammar remains ambiguous and K's just-in-time parser will report an error.
However, you are now able to parse more complex programs by means of explicitly
grouping subterms with the bracket we have just defined.

Consider and-or-left.bool:

(true && false) || false

Now consider and-or-right.bool:

true && (false || false)

If you parse these programs with kast, you will once again get a single
unique AST with no error. If you look, you might notice that the bracket itself
does not appear in the AST. In fact, this is a property unique to brackets:
productions with the bracket attribute are not represented in the parsed AST
of a term, and the child of the bracket is folded immediately into the parent
term. This is the reason for the requirement that a bracket production have
a single non-terminal of the same sort as the production itself.

Exercise

Write out what you expect the AST to be arising from parsing these two programs
above with --output kast, then parse them yourself and compare them to the
AST you expected. Confirm for yourself that the bracket production does not
appear in the AST.

Tokens

So far we have seen how we can define the grammar of a language. However,
the grammar is not the only relevant part of parsing a language. Also relevant
is the lexical syntax of the language. Thus far, we have implicitly been using
K's automatic lexer generation to generate a token in the scanner for each
terminal in our grammar. However, sometimes we wish to define more complex
lexical syntax. For example, consider the case of integers in C: an integer
consists of a decimal, octal, or hexadecimal number followed by an optional
suffix indicating the type of the literal.

In theory it would be possible to define this syntax via a grammar, but not
only would it be cumbersome and tedious, you would also then have to deal with
an AST generated for the literal which is not convenient to work with.

Instead of doing this, K allows you to define token productions, where
a production consists of a regular expression followed by the token
attribute, and the resulting AST consists of a typed string containing the
value recognized by the regular expression.

For example, the builtin integers in K are defined using the following
production:

syntax Int ::= r"[\\+\\-]?[0-9]+" [token]

Here we can see that we have defined that an integer is an optional sign
followed by a nonzero sequence of digits. The r preceding the terminal
indicates that what appears inside the double quotes is a regular expression,
and the token attribute indicates that terms which parse as this production
should be converted into a token by the parser.

It is also possible to define tokens that do not use regular expressions. This
can be useful when you wish to declare particular identifiers for use in your
semantics later. For example:

syntax Id ::= "main" [token]

Here, we declare that main is a token of sort Id. Instead of being parsed
as a symbol, it gets parsed as a token, generating a typed string in the AST.
This is useful in a semantics of C because the parser generally does not treat
the main function in C specially; only the semantics treats it specially.

Of course, languages can have more complex lexical syntax. For example, if we
wish to define the syntax of integers in C, we could use the following
production:

syntax IntConstant ::= r"(([1-9][0-9]*)|(0[0-7]*)|(0[xX][0-9a-fA-F]+))(([uU][lL]?)|([uU]((ll)|(LL)))|([lL][uU]?)|(((ll)|(LL))[uU]?))?" [token]

As you may have noted above, long and complex regular expressions
can be hard to read. They also suffer from the problem that unlike a grammar,
they are not particularly modular.

We can get around this restriction by declaring explicit regular expressions,
giving them a name, and then referring to them in productions.

Consider the following (equivalent) way to define the lexical syntax of
integers in C:

syntax IntConstant ::= r"({DecConstant}|{OctConstant}|{HexConstant})({IntSuffix}?)" [token]
syntax lexical DecConstant = r"{NonzeroDigit}({Digit}*)"
syntax lexical OctConstant = r"0({OctDigit}*)"
syntax lexical HexConstant = r"{HexPrefix}({HexDigit}+)"
syntax lexical HexPrefix = r"0x|0X"
syntax lexical NonzeroDigit = r"[1-9]"
syntax lexical Digit = r"[0-9]"
syntax lexical OctDigit = r"[0-7]"
syntax lexical HexDigit = r"[0-9a-fA-F]"
syntax lexical IntSuffix = r"{UnsignedSuffix}({LongSuffix}?)|{UnsignedSuffix}{LongLongSuffix}|{LongSuffix}({UnsignedSuffix}?)|{LongLongSuffix}({UnsignedSuffix}?)"
syntax lexical UnsignedSuffix = r"[uU]"
syntax lexical LongSuffix = r"[lL]"
syntax lexical LongLongSuffix = r"ll|LL"

As you can see, this is rather more verbose, but it has the benefit of both
being much easier to read and understand, and also increased modularity.
Note that we refer to a named regular expression by putting the name in curly
brackets. Note also that only the first sentence actually declares a new piece
of syntax in the language. When the user writes syntax lexical, they are only
declaring a regular expression. To declare an actual piece of syntax in the
grammar, you still must actually declare an explicit token production.

One final note: K uses Flex to implement
its lexical analysis. As a result, you can refer to the
Flex Manual
for a detailed description of the regular expression syntax supported. Note
that for performance reasons, Flex's regular expressions are actually a regular
language, and thus lack some of the syntactic convenience of modern
"regular expression" libraries. If you need features that are not part of the
syntax of Flex regular expressions, you are encouraged to express them via
a grammar instead.

Ahead-of-time parser generation

So far we have been entirely focused on K's support for just-in-time parsing,
where the parser is generated on the fly prior to being used. This benefits
from being faster to generate the parser, but it suffers in performance if you
have to repeatedly parse strings with the same parser. For this reason, it is
generally encouraged that when parsing programs, you use K's ahead-of-time
parser generation. K makes use of
GNU Bison to generate parsers.

By default, you can enable ahead-of-time parsing via the --gen-bison-parser
flag to kompile. This will make use of Bison's LR(1) parser generator. As
such, if your grammar is not LR(1), it may not parse exactly the same as if
you were to use the just-in-time parser, because Bison will automatically pick
one of the possible branches whenever it encounters a shift-reduce or
reduce-reduce conflict. In this case, you can either modify your grammar to be
LR(1), or you can enable use of Bison's GLR support by instead passing
--gen-glr-bison-parser to kompile. Note that if your grammar is ambiguous,
the ahead-of-time parser will not provide you with particularly readable error
messages at this time.

If you have a K definition named foo.k, and it generates a directory when
you run kompile called foo-kompiled, you can invoke the ahead-of-time
parser you generated by running foo-kompiled/parser_PGM <file> on a file.

Exercises

  1. Compile lesson-03-d.k with ahead-of-time parsing enabled. Then compare
    how long it takes to run kast --output kore and-or-left.bool with how long it
    takes to run lesson-03-d-kompiled/parser_PGM and-or-left.bool. Confirm for
    yourself that both produce the same result, but that the latter is faster.

  2. Define a simple grammar consisting of integers, brackets, addition,
    subtraction, multiplication, division, and unary negation. Integers should be
    in decimal form and lexically without a sign, whereas negative numbers can be
    represented via unary negation. Ensure that you are able to parse some basic
    arithmetic expressions using a generated ahead-of-time parser. Do not worry
    about disambiguating the grammar or about writing rules to implement the
    operations in this definition.

  3. Write a program where the meaning of the arithmetic expression based on
    the grammar you defined above is ambiguous, and then write programs that
    express each individual intended meaning using brackets.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.4: Disambiguating Parses.

Lesson 1.4: Disambiguating Parses

The purpose of this lesson is to teach how to use K's builtin features for
disambiguation to transform an ambiguous grammar into an unambiguous one that
expresses the intended ASTs.

Priority blocks

In practice, very few formal languages outside the domain of natural language
processing are ambiguous. The main reason for this is that parsing unambiguous
languages is asymptotically faster than parsing ambiguous languages.
Programming language designers instead usually use the notions of operator
precedence and associativity to make expression grammars unambiguous. These
mechanisms work by instructing the parser to reject certain ASTs in favor of
others in case of ambiguities; it is often possible to remove all ambiguities
in a grammar with these techniques.

While it is sometimes possible to explicitly rewrite the grammar to remove
these parses, because K's grammar specification and AST generation are
inextricably linked, this is generally discouraged. Instead, we use the
approach of explicitly expressing the relative precedence of different
operators in different situations in order to resolve the ambiguity.

For example, in C, && binds tighter in precedence than ||, meaning that
the expression true && false || false has only one valid AST:
(true && false) || false.

Consider, then, the third iteration on the grammar of this definition
(lesson-04-a.k):

module LESSON-04-A

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > Boolean "&&" Boolean [function]
                   > Boolean "^" Boolean [function]
                   > Boolean "||" Boolean [function]

endmodule

In this example, some of the | symbols separating productions in a single
block have been replaced with >. This serves to describe the
priority groups associated with this block of productions.
The first priority group consists of the atoms of the
language: true, false, and the bracket operator. In general, a priority
group starts either at the ::= or > operator and extends until either the
next > operator or the end of the production block. Thus, we can see that the
second, third, fourth, and fifth priority groups in this grammar all consist
of a single production.

The meaning of these priority groups becomes apparent when parsing programs:
A symbol with a lesser priority, (i.e., one that binds looser), cannot
appear as the direct child of a symbol with a greater priority (i.e.,
one that binds tighter. In this case, the > operator can be seen as a
greater-than operator describing a transitive partial ordering on the
productions in the production block, expressing their relative priority.

To see this more concretely, let's look again at the program
true && false || false. As noted before, previously this program was
ambiguous because the parser could either choose that && was the child of ||
or vice versa. However, because a symbol with lesser priority (i.e., ||)
cannot appear as the direct child of a symbol with greater priority
(i.e., &&), the parser will reject the parse where || is under the
&& operator. As a result, we are left with the unambiguous parse
(true && false) || false. Similarly, true || false && false parses
unambiguously as true || (false && false). Conversely, if the user explicitly
wants the other parse, they can express this using brackets by explicitly
writing true && (false || false). This still parses successfully because the
|| operator is no longer the direct child of the && operator, but is
instead the direct child of the () operator, and the && operator is an
indirect parent, which is not subject to the priority restriction.

Astute readers, however, will already have noticed what seems to be a
contradiction: we have defined () as also having greater priority than ||.
One would think that this should mean that || cannot appear as a direct
child of (). This is a problem because priority groups are applied to every
possible parse separately. That is to say, even if the term is unambiguous
prior to this disambiguation rule, we still reject that parse if it violates
the rule of priority.

In fact, however, we do not reject this program as a parse error. Why is that?
Well, the rule for priority is slightly more complex than previously described.
In actual fact, it applies only conditionally. Specifically, it applies in
cases where the child is either the first or last production item in the
parent's production. For example, in the production Bool "&&" Bool, the
first Bool non-terminal is not preceded by any terminals, and the last Bool
non-terminal is not followed by any terminals. As a result of this, we apply
the priority rule to both children of &&. However, in the () operator,
the sole non-terminal is both preceded by and followed by terminals. As a
result, the priority rule is not applied when () is the parent. Because of
this, the program we mentioned above successfully parses.

Exercise

Parse the program true && false || false using kast, and confirm that the AST
places || as the top level symbol. Then modify the definition so that you
will get the alternative parse.

Associativity

Even having broken the expression grammar into priority blocks, the resulting
grammar is still ambiguous. We can see this if we try to parse the following
program (assoc.bool):

true && false && false

Priority blocks will not help us here: the problem comes between two parses
where both possible parses have a direct parent and child which is within a
single priority block (in this case, && is in the same block as itself).

This is where the notion of associativity comes into play. Associativity
applies the following additional rules to parses:

  • a left-associative symbol cannot appear as a direct rightmost child of a
    symbol with equal priority;
  • a right-associative symbol cannot appear as a direct leftmost child of a
    symbol with equal priority; and
  • a non-associative symbol cannot appear as a direct leftmost or rightmost
    child of a symbol with equal priority.

In C, binary operators are all left-associative, meaning that the expression
true && false && false parses unambiguously as (true && false) && false,
because && cannot appear as the rightmost child of itself.

Consider, then, the fourth iteration on the grammar of this definition
(lesson-04-b.k):

module LESSON-04-B

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > left: Boolean "&&" Boolean [function]
                   > left: Boolean "^" Boolean [function]
                   > left: Boolean "||" Boolean [function]

endmodule

Here each priority group, immediately after the ::= or > operator, can
be followed by a symbol representing the associativity of that priority group:
either left: for left associativity, right: for right associativity, or
non-assoc: for non-associativity. In this example, each priority group we
apply associativity to has only a single production, but we could equally well
write a priority block with multiple productions and an associativity.

For example, consider the following, different grammar (lesson-04-c.k):

module LESSON-04-C

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > left:
                     Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

In this example, unlike the one above, &&, ^, and || have the same
priority. However, viewed as a group, the entire group is left associative.
This means that none of &&, ^, and || can appear as the right child of
any of &&, ^, or ||. As a result of this, this grammar is also not
ambiguous. However, it expresses a different grammar, and you are encouraged
to think about what the differences are in practice.

Exercise

Parse the program true && false && false yourself, and confirm that the AST
places the rightmost && at the top of the expression. Then modify the
definition to generate the alternative parse.

Explicit priority and associativity declarations

Previously we have only considered the case where all of the productions
which you wish to express a priority or associativity relation over are
co-located in the same block of productions. However, in practice this is not
always feasible or desirable, especially as a definition grows in size across
multiple modules.

As a result of this, K provides a second way of declaring priority and
associativity relations.

Consider the following grammar, which we will name lesson-04-d.k and which
will express the exact same grammar as lesson-04-b.k

module LESSON-04-D

  syntax Boolean ::= "true" [group(literal)] | "false" [group(literal)]
                   | "(" Boolean ")" [group(atom), bracket]
                   | "!" Boolean [group(not), function]
                   | Boolean "&&" Boolean [group(and), function]
                   | Boolean "^" Boolean [group(xor), function]
                   | Boolean "||" Boolean [group(or), function]

  syntax priority literal atom > not > and > xor > or
  syntax left and
  syntax left xor
  syntax left or
endmodule

This introduces a couple of new features of K. First, the group(_) attribute
is used to conceptually group together sets of sentences under a common
user-defined name. For example, literal in the syntax priority sentence is
used to refer to all the productions marked with the group(literal) attribute,
i.e., true and false. A production can belong to multiple groups using
syntax such as group(myGrp1,myGrp2).

Once we understand this, it becomes relatively straightforward to understand
the meaning of this grammar. Each syntax priority sentence defines a
priority relation where > separates different priority groups. Each priority
group is defined by a list of one or more group names, and consists of all
productions which are members of at least one of those named groups.

In the same way, a syntax left, syntax right, or syntax non-assoc sentence
defines an associativity relation among left-, right-, or non-associative
groups. Specifically, this means that:

syntax left a b

is different to:

syntax left a
syntax left b

As a consequence of this, syntax [left|right|non-assoc] should not be used to
group together labels with different priority.

Prefer/avoid

Sometimes priority and associativity prove insufficient to disambiguate a
grammar. In particular, sometimes it is desirable to be able to choose between
two ambiguous parses directly while still not rejecting any parses if the term
parsed is unambiguous. A good example of this is the famous "dangling else"
problem in imperative C-like languages.

Consider the following definition (lesson-04-E.k):

module LESSON-04-E

  syntax Exp ::= "true" | "false"
  syntax Stmt ::= "if" "(" Exp ")" Stmt
                | "if" "(" Exp ")" Stmt "else" Stmt
                | "{" "}"
endmodule

We can write the following program (dangling-else.if):

if (true) if (false) {} else {}

This is ambiguous because it is unclear whether the else clause is part of
the outer if or the inner if. At first we might try to resolve this with
priorities, saying that the if without an else cannot appear as a child of
the if with an else. However, because the non-terminal in the parent symbol
is both preceded and followed by a terminal, this will not work.

Instead, we can resolve the ambiguity directly by telling the parser to
"prefer" or "avoid" certain productions when ambiguities arise. For example,
when we parse this program, we see the following ambiguity as an error message:

[Error] Inner Parser: Parsing ambiguity.
1: syntax Stmt ::= "if" "(" Exp ")" Stmt

`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(`true_LESSON-04-E_Exp`(.KList),`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(`false_LESSON-04-E_Exp`(.KList),`;_LESSON-04-E_Stmt`(.KList),`;_LESSON-04-E_Stmt`(.KList)))
2: syntax Stmt ::= "if" "(" Exp ")" Stmt "else" Stmt

`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(`true_LESSON-04-E_Exp`(.KList),`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(`false_LESSON-04-E_Exp`(.KList),`;_LESSON-04-E_Stmt`(.KList)),`;_LESSON-04-E_Stmt`(.KList))
        Source(./dangling-else.if)
        Location(1,1,1,30)

Roughly, we see that the ambiguity is between an if with an else or an if
without an else. Since we want to pick the first parse, we can tell K to
"avoid" the second parse with the avoid attribute. Consider the following
modified definition (lesson-04-f.k):

module LESSON-04-F

  syntax Exp ::= "true" | "false"
  syntax Stmt ::= "if" "(" Exp ")" Stmt
                | "if" "(" Exp ")" Stmt "else" Stmt [avoid]
                | "{" "}"
endmodule

Here we have added the avoid attribute to the else production. As a result,
when an ambiguity occurs and one or more of the possible parses has that symbol
at the top of the ambiguous part of the parse, we remove those parses from
consideration and consider only those remaining. The prefer attribute behaves
similarly, but instead removes all parses which do not have that attribute.
In both cases, no action is taken if the parse is not ambiguous.

Exercises

  1. Parse the program if (true) if (false) {} else {} using lesson-04-f.k
    and confirm that else clause is part of the innermost if statement. Then
    modify the definition so that you will get the alternative parse.

  2. Modify your solution from Lesson 1.3, Exercise 2 so that unary negation should
    bind tighter than multiplication and division, which should bind tighter than
    addition and subtraction, and each binary operator should be left associative.
    Write these priority and associativity declarations explicitly, and then
    try to write them inline.

  3. Write a simple grammar containing at least one ambiguity that cannot be
    resolved via priority or associativity, and then use the prefer attribute to
    resolve that ambiguity.

  4. Explain why the following grammar is not labeled ambiguous by the K parser when parsing abb, then make the parser realize the ambiguity.

module EXERCISE4

syntax Expr ::= "a" Expr "b"
              | "abb"
              | "b"

endmodule

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.5: Modules, Imports, and Requires.

Lesson 1.5: Modules, Imports, and Requires

The purpose of this lesson is to explain how K definitions can be broken into
separate modules and files and how these distinct components combine into a
complete K definition.

K's outer syntax

Recall from Lesson 1.3 that K's grammar is broken
into two components: the outer syntax of K and the inner syntax of K.
Outer syntax, as previously mentioned, consists of requires, modules,
imports, and sentences. A K semantics is expressed by the set of
sentences contained in the definition. The scope of what is considered
contained in that definition is determined both by the main semantics
module
of a K definition, as well as the requires and imports present
in the file that contains that module.

Basic module syntax

The basic unit of grouping sentences in K is the module. A module consists
of a module name, an optional list of attributes, a list of
imports, and a list of sentences.

A module name consists of one or more groups of letters, numbers, or
underscores, separated by a hyphen. Here are some valid module names: FOO,
FOO-BAR, foo0, foo0_bar-Baz9. Here are some invalid module names: -,
-FOO, BAR-, FOO--BAR. Stylistically, modules names are usually all
uppercase with hyphens separating words, but this is not strictly enforced.

Some example modules include an empty module:

module LESSON-05-A

endmodule

A module with some attributes:

module LESSON-05-B [group(attr1,attr2), private]

endmodule

A module with some sentences:

module LESSON-05-C
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
  rule not true => false
  rule not false => true
endmodule

Imports

Thus far we have only discussed definitions containing a single module.
Definitions can also contain multiple modules, in which one module imports
others.

An import in K appears at the top of a module, prior to any sentences. It can
be specified with the imports keyword, followed by a module name.

For example, here is a simple definition with two modules (lesson-05-d.k):

module LESSON-05-D-1
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
endmodule

module LESSON-05-D
  imports LESSON-05-D-1

  rule not true => false
  rule not false => true
endmodule

This K definition is equivalent to the definition expressed by the single module
LESSON-05-C. Essentially, by importing a module, we include all of the
sentences in the module being imported into the module that we import from.
There are a few minor differences between importing a module and simply
including its sentences in another module directly, but we will cover these
differences later. Essentially, you can think of modules as a way of
conceptually grouping sentences in a larger K definition.

Exercise

Modify lesson-05-d.k to include four modules: one containing the syntax, two
with one rule each that imports the first module, and a final module
LESSON-05-D containing no sentences that imports the second and third module.
Check to make sure the definition still compiles and that you can still evaluate
the not function.

Parsing in the presence of multiple modules

As you may have noticed, each module in a definition can express a distinct set
of syntax. When parsing the sentences in a module, we use the syntax
of that module, enriched with the basic syntax of K, in order to parse
rules in that module. For example, the following definition is a parser error
(lesson-05-e.k):

module LESSON-05-E-1
  rule not true => false
  rule not false => true
endmodule

module LESSON-05-E-2
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
endmodule

This is because the syntax referenced in module LESSON-05-E-1, namely, not,
true, and false, is not imported by that module. You can solve this problem
by simply importing the modules containing the syntax you want to use in your
sentences.

Main syntax and semantics modules

When we are compiling a K definition, we need to know where to start. We
designate two specific entry point modules: the main syntax module
and the main semantics module. The main syntax module, as well as all the
modules it imports recursively, are used to create the parser for programs that
you use to parse programs that you execute with krun. The main semantics
module, as well as all the modules it imports recursively, are used to
determine the rules that can be applied at runtime in order to execute a
program. For example, in the above example, if the main semantics module is
module LESSON-05-D-1, then not is an uninterpreted function (i.e., has no
rules associated with it), and the rules in module LESSON-05-D are not
included.

While you can specify the entry point modules explicitly by passing the
--main-module and --syntax-module flags to kompile, by default, if you
type kompile foo.k, then the main semantics module will be FOO and the
main syntax module will be FOO-SYNTAX.

Splitting a definition into multiple files

So far, while we have discussed ways to break definitions into separate
conceptual components (modules), K also provides a mechanism for combining
multiple files into a single K definition, namely, the requires directive.

In K, the requires keyword has two meanings. The first, the requires
statement, appears at the top of a K file, prior to any module declarations. It
consists of the keyword requires followed by a double-quoted string. The
second meaning of the requires keyword will be covered in a later lesson,
but it is distinguished because the second case occurs only inside modules.

The string passed to the requires statement contains a filename. When you run
kompile on a file, it will look at all of the requires statements in that
file, look up those files on disk, parse them, and then recursively process all
the requires statements in those files. It then combines all the modules in all
of those files together, and uses them collectively as the set of modules to
which imports statements can refer.

Putting it all together

Putting it all together, here is one possible way in which we could break the
definition lesson-02-c.k from Lesson 1.2 into
multiple files and modules:

colors.k:

module COLORS
  syntax Color ::= Yellow()
                 | Blue()
endmodule

fruits.k:

module FRUITS
  syntax Fruit ::= Banana()
                 | Blueberry()
endmodule

colorOf.k:

requires "fruits.k"
requires "colors.k"

module COLOROF-SYNTAX
  imports COLORS
  imports FRUITS

  syntax Color ::= colorOf(Fruit) [function]
endmodule

module COLOROF
  imports COLOROF-SYNTAX

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()
endmodule

You would then compile this definition with kompile colorOf.k and use it the
same way as the original, single-module definition.

Exercise

Modify the name of the COLOROF module, and then recompile the definition.
Try to understand why you now get a compiler error. Then, resolve this compiler
error by passing the --main-module and --syntax-module flags to kompile.

Include path

One note can be made about how paths are resolved in requires statements.

By default, the path you specify is allowed to be an absolute or a relative
path. If the path is absolute, that exact file is imported. If the path is
relative, a matching file is looked for within all of the
include directories specified to the compiler. By default, the include
directories include the current working directory, followed by the
include/kframework/builtin directory within your installation of K. You can
also pass one or more directories to kompile via the -I command line flag,
in which case these directories are prepended to the beginning of the list.

Exercises

  1. Take the solution to Lesson 1.4, Exercise 2 which included the explicit
    priority and associativity declarations, and modify the definition so that
    the syntax of integers and brackets is in one module, the syntax of addition,
    subtraction, and unary negation is in another module, and the syntax of
    multiplication and division is in a third module. Make sure you can still parse
    the same set of expressions as before. Place priority declarations in the main
    module.

  2. Modify lesson-02-d.k from Lesson 1.2 so that the rules and syntax are in
    separate modules in separate files.

  3. Place the file containing the syntax from Exercise 2 in another directory,
    then recompile the definition. Observe why a compilation error occurs. Then
    fix the compiler error by passing -I to kompile.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.6: Integers and Booleans.

Lesson 1.6: Integers and Booleans

The purpose of this lesson is to explain the two most basic types of builtin
sorts in K, the Int sort and the Bool sort, representing
arbitrary-precision integers and Boolean algebra.

Builtin sorts in K

K provides definitions of some useful sorts in
domains.md, found in the
include/kframework/builtin directory of the K installation. This file is
defined via a
Literate programming
style that we will discuss in a future lesson. We will not cover all of the
sorts found there immediately, however, this lesson discusses some of the
details surrounding integers and Booleans, as well as providing information
about how to look up more detailed knowledge about builtin functions in K's
documentation.

Booleans in K

The most basic builtin sort K provides is the Bool sort, representing
Boolean values (i.e., true and false). You have already seen how we were
able to create this type ourselves using K's parsing and disambiguation
features. However, in the vast majority of cases, we prefer instead to import
the version of Boolean algebra defined by K itself. Most simply, you can do
this by importing the module BOOL in your definition. For example
(lesson-06-a.k):

module LESSON-06-A
  imports BOOL

  syntax Fruit ::= Blueberry() | Banana()
  syntax Bool ::= isBlue(Fruit) [function]

  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false
endmodule

Here we have defined a simple predicate, i.e., a function returning a
Boolean value. We are now able to perform the usual Boolean operations of
and, or, and not over these values. For example (lesson-06-b.k):"

module LESSON-06-B
  imports BOOL

  syntax Fruit ::= Blueberry() | Banana()
  syntax Bool ::= isBlue(Fruit) [function]

  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false

  syntax Bool ::= isYellow(Fruit) [function]
                | isBlueOrYellow(Fruit) [function]

  rule isYellow(Banana()) => true
  rule isYellow(Blueberry()) => false

  rule isBlueOrYellow(F) => isBlue(F) orBool isYellow(F)
endmodule

In the above example, Boolean inclusive or is performed via the orBool
function, which is defined in the BOOL module. As a matter of convention,
many functions over builtin sorts in K are suffixed with the name of the
primary sort over which those functions are defined. This happens so that the
syntax of K does not (generally) conflict with the syntax of any other
programming language, which would make it harder to define that programming
language in K.

Exercise

Write a function isBlueAndNotYellow which computes the appropriate Boolean
expression. If you are unsure what the appropriate syntax is to use, you
can refer to the BOOL module in
domains.md. Add a term of
sort Fruit for which isBlue and isYellow both return true, and test that
the isBlueAndNotYellow function behaves as expected on all three Fruits.

Syntax Modules

For most sorts in domains.md, K defines more than one module that can be
imported by users. For example, for the Bool sort, K defines the BOOL
module that has previously already been discussed, but also provides the
BOOL-SYNTAX module. This module, unlike the BOOL module, only declares the
values true and false, but not any of the functions that operate over the
Bool sort. The rationale is that you may want to import this module into the
main syntax module of your definition in some cases, whereas you generally do
not want to do this with the version of the module that includes all the
functions over the Bool sort. For example, if you were defining the semantics
of C++, you might import BOOL-SYNTAX into the syntax module of your
definition, because true and false are part of the grammar of C++, but
you would only import the BOOL module into the main semantics module, because
C++ defines its own syntax for and, or, and not that is different from the
syntax defined in the BOOL module.

Here, for example, is how we might redefine our Boolean expression calculator
to use the Bool sort while maintaining an idiomatic structure of modules
and imports, for the first time including the rules to calculate the values of
expressions themselves (lesson-06-c.k):

module LESSON-06-C-SYNTAX
  imports BOOL-SYNTAX

  syntax Bool ::= "(" Bool ")" [bracket]
                > "!" Bool [function]
                > left:
                  Bool "&&" Bool [function]
                | Bool "^" Bool [function]
                | Bool "||" Bool [function]
endmodule

module LESSON-06-C
  imports LESSON-06-C-SYNTAX
  imports BOOL

  rule ! B => notBool B
  rule A && B => A andBool B
  rule A ^ B => A xorBool B
  rule A || B => A orBool B
endmodule

Note the encapsulation of syntax: the LESSON-06-C-SYNTAX module contains
exactly the syntax of our Boolean expressions, and no more, whereas any other
syntax needed to implement those functions is in the LESSON-06-C module
instead.

Exercise

Add an "implies" function to the above Boolean expression calculator, using the
-> symbol to represent implication. You can look up K's builtin "implies"
function in the BOOL module in domains.md.

Integers in K

Unlike most programming languages, where the most basic integer type is a
fixed-precision integer type, the most commonly used integer sort in K is
the Int sort, which represents the mathematical integers, ie,
arbitrary-precision integers.

K provides three main modules for import when using the Int sort. The first,
containing all the syntax of integers as well as all of the functions over
integers, is the INT module. The second, which provides just the syntax
of integer literals themselves, is the INT-SYNTAX module. However, unlike
most builtin sorts in K, K also provides a third module for the Int sort:
the UNSIGNED-INT-SYNTAX module. This module provides only the syntax of
non-negative integers, i.e., natural numbers. The reasons for this involve
lexical ambiguity. Generally speaking, in most programming languages, -1 is
not a literal, but instead a literal to which the unary negation operator is
applied. K thus provides this module to ease in specifying the syntax of such
languages.

For detailed information about the functions available over the Int sort,
refer to domains.md. Note again how we append Int to the end of most of the
integer operations to ensure they do not collide with the syntax of other
programming languages.

Exercises

  1. Extend your solution from Lesson 1.4, Exercise 2 to implement the rules
    that define the behavior of addition, subtraction, multiplication, and
    division. Do not worry about the case when the user tries to divide by zero
    at this time. Use /Int to implement division. Test your new calculator
    implementation by executing the arithmetic expressions you wrote as part of
    Lesson 1.3, Exercise 2. Check to make sure each computes the value you expected.

  2. Combine the Boolean expression calculator from this lesson with your
    solution to Exercise 1, and then extend the combined calculator with the <,
    <=, >, >=, ==, and != expressions. Write some Boolean expressions
    that combine integer and Boolean operations, and test to ensure that these
    expressions return the expected truth value.

  3. Compute the following expressions using your solution from Exercise 2:
    7 / 3, 7 / -3, -7 / 3, -7 / -3. Then replace the /Int function in
    your definition with divInt instead, and observe how the value of the above
    expressions changes. Why does this occur?

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.7: Side Conditions and Rule Priority.

Lesson 1.7: Side Conditions and Rule Priority

The purpose of this lesson is to explain how to write conditional rules in K,
and to explain how to control the order in which rules are tried.

Side Conditions

So far, all of the rules we have discussed have been unconditional rules.
If the left-hand side of the rule matches the arguments to the function, the
rule applies. However, there is another type of rule, a conditional rule.
A conditional rule consists of a rule body containing the patterns to
match, and a side condition representing a Boolean expression that must
evaluate to true in order for the rule to apply.

Side conditions in K are introduced via the requires keyword immediately
following the rule body. For example, here is a rule with a side condition
(lesson-07-a.k):

module LESSON-07-A
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90
endmodule

In this case, the gradeFromPercentile function takes a single integer
argument. The function evaluates to letter-A if the argument passed is
greater than 90. Note that the side condition is allowed to refer to variables
that appear on the left-hand side of the rule. In the same manner as variables
appearing on the right-hand side, variables that appear in the side condition
evaluate to the value that was matched on the left-hand side. Then the
functions in the side condition are evaluated, which returns a term of sort
Bool. If the term is equal to true, then the rule applies. Bear in mind
that the side condition is only evaluated at all if the patterns on the
left-hand side of the rule match the term being evaluated.

Exercise

Write a rule that evaluates gradeFromPercentile to letter-B if the argument
to the function is in the range [80,90). Test that the function correctly
evaluates various numbers between 80 and 100.

owise Rules

So far, all the rules we have introduced have had the same priority. What
this means is that K does not necessarily enforce an order in which the rules
are tried. We have only discussed functions so far in K, so it is not
immediately clear why this choice was made, given that a function is not
considered well-defined if multiple rules for evaluating it are capable of
evaluating the same arguments to different results. However, in future lessons
we will discuss other types of rules in K, some of which can be
non-deterministic. What this means is that if more than one rule is capable
of matching, then K will explore both possible rules in parallel, and consider
each of their respective results when executing your program. Don't worry too
much about this right now, but just understand that because of the potential
later for nondeterminism, we don't enforce a total ordering on the order in
which rules are attempted to be applied.

However, sometimes this is not practical; It can be very convenient to express
that a particular rule applies if no other rules for that function are
applicable. This can be expressed by adding the owise attribute to a rule.
What this means, in practice, is that this rule has lower priority than other
rules, and will only be tried to be applied after all the other,
higher-priority rules have been tried and they have failed.

For example, in the above exercise, we had to add a side condition containing
two Boolean comparisons to the rule we wrote to handle letter-B grades.
However, in practice this meant that we compare the percentile to 90 twice. We
can more efficiently and more idiomatically write the letter-B case for the
gradeFromPercentile rule using the owise attribute (lesson-07-b.k):

module LESSON-07-B
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [owise]
endmodule

This rule is saying, "if all the other rules do not apply, then the grade is a
B if the percentile is greater than or equal to 80." Note here that we use both
a side condition and an owise attribute on the same rule. This is not
required (as we will see later), but it is allowed. What this means is that the
side condition is only tried if the other rules did not apply and the
left-hand side of the rule matched. You can even use more complex matching on
the left-hand side than simply a variable. More generally, you can also have
multiple higher-priority rules, or multiple owise rules. What this means in
practice is that all of the non-owise rules are tried first, in any order,
followed by all the owise rules, in any order.

Exercise

The grades D and F correspond to the percentile ranges [60, 70) and [0, 60)
respectively. Write another implementation of gradeFromPercentile which
handles only these cases, and uses the owise attribute to avoid redundant
Boolean comparisons. Test that various percentiles in the range [0, 70) are
evaluated correctly.

Rule Priority

As it happens, the owise attribute is a specific case of a more general
concept we call rule priority. In essence, each rule is assigned an integer
priority. Rules are tried in increasing order of priority, starting with a
rule with priority zero, and trying each increasing numerical value
successively.

By default, a rule is assigned a priority of 50. If the rule has the owise
attribute, it is instead given the priority 200. You can see why this will
cause owise rules to be tried after regular rules.

However, it is also possible to directly assign a numerical priority to a rule
via the priority attribute. For example, here is an alternative way
we could express the same two rules in the gradeFromPercentile function
(lesson-07-c.k):

module LESSON-07-C
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)]
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(200)]
endmodule

We can, of course, assign a priority equal to any non-negative integer. For
example, here is a more complex example that handles the remaining grades
(lesson-07-d.k):

module LESSON-07-D
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A"
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)]
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(51)]
  rule gradeFromPercentile(I) => letter-C requires I >=Int 70 [priority(52)]
  rule gradeFromPercentile(I) => letter-D requires I >=Int 60 [priority(53)]
  rule gradeFromPercentile(_) => letter-F                     [priority(54)]
endmodule

Note that we have introduced a new piece of syntax here: _. This is actually
just a variable. However, as a special case, when a variable is named _, it
does not bind a value that can be used on the right-hand side of the rule, or
in a side condition. Effectively, _ is a placeholder variable that means "I
don't care about this term."

In this example, we have explicitly expressed the order in which the rules of
this function are tried. Since rules are tried in increasing numerical
priority, we first try the rule with priority 50, then 51, then 52, 53, and
finally 54.

As a final note, remember that if you assign a rule a priority higher than 200,
it will be tried after a rule with the owise attribute, and if you assign
a rule a priority less than 50, it will be tried before a rule with no
explicit priority.

Exercises

  1. Write a function isEven that returns whether an integer is an even number.
    Use two rules and one side condition. The right-hand side of the rules should
    be Boolean literals. Refer back to
    domains.md for the relevant
    integer operations.

  2. Modify the calculator application from Lesson 1.6, Exercise 2, so that division
    by zero will no longer make krun crash with a "Divison by zero" exception.
    Instead, the / function should not match any of its rules if the denominator
    is zero.

  3. Write your own implementation of ==, <, <=, >, >= for integers and modify your solution from Exercise 2 to use it.
    You can use any arithmetic operations in the INT module, but do not use any built-in boolean functions for comparing integers.

    Hint: Use pattern matching and recursive definitions with rule priorities.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.8: Literate Programming with Markdown.

Lesson 1.8: Literate Programming with Markdown

The purpose of this lesson is to teach a paradigm for performing literate
programming in K, and explain how this can be used to create K definitions
that are also documentation.

Markdown and K

The K tutorial so far has been written in
Markdown. Markdown,
for those not already familiar, is a lightweight plain-text format for styling
text. From this point onward, we assume you are familiar with Markdown and how
to write Markdown code. You can refer to the above link for a tutorial if you
are not already familiar.

What you may not necessarily realize, however, is that the K tutorial is also
a sequence of K definitions written in the manner of
Literate Programming.
For detailed information about Literate Programming, you can read the linked
Wikipedia article, but the short summary is that literate programming is a way
of intertwining documentation and code together in a manner that allows
executable code to also be, simultaneously, a documented description of that
code.

K is provided with built-in support for literate programming using Markdown.
By default, if you pass a file with the .md file extension to kompile, it
will look for any code blocks containing k code in that file, extract out
that K code into pure K, and then compile it as if it were a .k file.

A K code block begins with a line of text containing the keyword ```k,
and ends when it encounters another ``` keyword.

For example, if you view the markdown source of this document, this is a K
code block:

module LESSON-08
  imports INT

Only the code inside K code blocks will actually be sent to the compiler. The
rest, while it may appear in the document when rendered by a markdown viewer,
is essentially a form of code comment.

When you have multiple K code blocks in a document, K will append each one
together into a single file before passing it off to the outer parser.

For example, the following code block contains sentences that are part of the
LESSON-08 module that we declared the beginning of above:

  syntax Int ::= Int "+" Int [function]
  rule I1 + I2 => I1 +Int I2

Exercise

Compile this file with kompile README.md --main-module LESSON-08. Confirm
that you can use the resulting compiled definition to evaluate the +
function.

Markdown Selectors

On occasion, you may want to generate multiple K definitions from a single
Markdown file. You may also wish to include a block of syntax-highlighted K
code that nonetheless does not appear as part of your K definition. It is
possible to accomplish this by means of the built-in support for syntax
highlighting in Markdown. Markdown allows a code block that was begun with
``` to be immediately followed by a string which is used to signify what
programming language the following code is written in. However, this feature
actually allows arbitrary text to appear describing that code block. Markdown
parsers are able to parse this text and render the code block differently
depending on what text appears after the backticks.

In K, you can use this functionality to specify one or more
Markdown selectors which are used to describe the code block. A Markdown
selector consists of a sequence of characters containing letters, numbers, and
underscores. A code block can be designated with a single selector by appending
the selector immediately following the backticks that open the code block.

For example, here is a code block with the foo selector:

foo bar

Note that this is not K code. By convention, K code should have the k
selector on it. You can express multiple selectors on a code block by putting
them between curly braces and prepending each with the . character. For
example, here is a code block with the foo and k selectors:

  syntax Int ::= foo(Int) [function]
  rule foo(0) => 0

Because this code block contains the k Markdown selector, by default it is
included as part of the K definition being compiled.

Exercise

Confirm this fact by using krun to evaluate foo(0).

Markdown Selector Expressions

By default, as previously stated, K includes in the definition any code block
with the k selector. However, this is merely a specific instance of a general
principle, namely, that K allows you to control which selectors get included
in your K definition. This is done by means of the --md-selector flag to
kompile. This flag accepts a Markdown selector expression, which you
can essentially think of as a kind of Boolean algebra over Markdown selectors.
Each selector becomes an atom, and you can combine these atoms via the &,
|, !, and () operators.

Here is a grammar, written in K, of the language of Markdown selector
expressions:

  syntax Selector ::= r"[0-9a-zA-Z_]+" [token]
  syntax SelectorExp ::= Selector
                       | "(" SelectorExp ")" [bracket]
                       > right:
                         "!" SelectorExp
                       > right:
                         SelectorExp "&" SelectorExp
                       > right:
                         SelectorExp "|" SelectorExp

Here is a selector expression that selects all the K code blocks in this
definition except the one immediately above:

k & (! selector)

Addendum

This code block exists in order to make the above lesson a syntactically valid
K definition. Consider why it is necessary.

endmodule

Exercises

  1. Compile this lesson with the selector expression k & (! foo) and confirm
    that you get a parser error if you try to evaluate the foo function with the
    resulting definition.

  2. Compile Lesson 1.3
    as a K definition. Identify why it fails to compile. Then pass an appropriate
    --md-selector to the compiler in order to make it compile.

  3. Modify your calculator application from Lesson 1.7, Exercise 2, to be written
    in a literate style. Consider what text might be appropriate to turn the
    resulting markdown file into documentation for your calculator.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.9: Unparsing and the format and color attributes.

Lesson 1.9: Unparsing and the format and color attributes

The purpose of this lesson is to teach the user about how terms are
pretty-printed in K, and how the user can make adjustments to the default
settings for how to print specific terms.

Parsing, Execution, and Unparsing

When you use krun to interpret a program, the tool passes through three major
phases. In the first, parsing, the program itself is parsed using either kast
or an ahead-of-time parser generated via Bison, and the resulting AST becomes
the input to the interpreter. In the second phase, execution, K evaluates
functions and (as we will discuss in depth later) performs rewrite steps to
iteratively transform the program state. The third and final phase is called
unparsing, because it consists of taking the final state of the application
after the program has been interpreted, and converting it from an AST back into
text that (in theory, anyway) could be parsed back into the same AST that was
the output of the execution phase.

In practice, parsing is not always precisely reversible. It turns out
(although we are not going to cover exactly why this is here), that
constructing a sound algorithm that takes a grammar and an AST and emits text
that could be parsed via that grammar to the original AST is an
NP-hard problem. As a result, in the interests of avoiding exponential time
algorithms when users rarely care about unparsing being completely sound, we
take certain shortcuts that provide a linear-time algorithm that approximates
a sound solution to the problem while sacrificing the notion that the result
can be parsed into the exact original term in all cases.

This is a lot of theoretical explanation, but at root, the unparsing process
is fairly simple: it takes a K term that is the output of execution and pretty
prints it according to the syntax defined by the user in their K definition.
This is useful because the original AST is not terribly user-readable, and it
is difficult to visualize the entire term or decipher information about the
final state of the program at a quick glance. Of course, in rare cases, the
pretty-printed configuration loses information of relevance, which is why K
allows you to obtain the original AST on request.

As an example of all of this, consider the following K definition
(lesson-09-a.k):

module LESSON-09-A
  imports BOOL

  syntax Exp ::= "(" Exp ")" [bracket]
               | Bool
               > "!" Exp
               > left:
                 Exp "&&" Exp
               | Exp "^" Exp
               | Exp "||" Exp

  syntax Exp ::= id(Exp) [function]
  rule id(E) => E
endmodule

This is similar to the grammar we defined in LESSON-06-C, with the difference
that the Boolean expressions are now constructors of sort Exp and we define a
trivial function over expressions that returns its argument unchanged.

We can now parse a simple program in this definition and use it to unparse some
Boolean expressions. For example (exp.bool):

id(true&&false&&!true^(false||true))

Here is a program that is not particularly legible at first glance, because all
extraneous whitespace has been removed. However, if we run krun exp.bool, we
see that the result of the unparser will pretty-print this expression rather
nicely:

<k>
  true && false && ! true ^ ( false || true ) ~> .
</k>

Notably, not only does K insert whitespace where appropriate, it is also smart
enough to insert parentheses where necessary in order to ensure the correct
parse. For example, without those parentheses, the expression above would parse
equivalent to the following one:

(((true && false) && ! true) ^ false) || true

Indeed, you can confirm this by passing that exact expression to the id
function and evaluating it, then looking at the result of the unparser:

<k>
  true && false && ! true ^ false || true ~> .
</k>

Here, because the meaning of the AST is the same both with and without
parentheses, K does not insert any parentheses when unparsing.

Exercise

Modify the grammar of LESSON-09-A above so that the binary operators are
right associative. Try unparsing exp.bool again, and note how the result is
different. Explain the reason for the difference.

Custom unparsing of terms

You may have noticed that right now, the unparsing of terms is not terribly
imaginative. All it is doing is taking each child of the term, inserting it
into the non-terminal positions of the production, then printing the production
with a space between each terminal or non-terminal. It is easy to see why this
might not be desirable in some cases. Consider the following K definition
(lesson-09-b.k):

module LESSON-09-B
  imports BOOL

  syntax Stmt ::= "{" Stmt "}" | "{" "}"
                > right:
                  Stmt Stmt
                | "if" "(" Bool ")" Stmt
                | "if" "(" Bool ")" Stmt "else" Stmt [avoid]
endmodule

This is a statement grammar, simplified to the point of meaninglessness, but
still useful as an object lesson in unparsing. Consider the following program
in this grammar (if.stmt):

if (true) {
  if (true) {}
  if (false) {}
  if (true) {
    if (false) {} else {}
  } else {
    if (false) {}
  }
}

This is how that term would be unparsed if it appeared in the output of krun:

if ( true ) { if ( true ) { } if ( false ) { } if ( true ) { if ( false ) { } else { } } else { if ( false ) { } } }

This is clearly much less legible than we started with! What are we to do?
Well, K provides an attribute, format, that can be applied to any production,
which controls how that production gets unparsed. You've seen how it gets
unparsed by default, but via this attribute, the developer has complete control
over how the term is printed. Of course, the user can trivially create ways to
print terms that would not parse back into the same term. Sometimes this is
even desirable. But in most cases, what you are interested in is controlling
the line breaking, indentation, and spacing of the production.

Here is an example of how you might choose to apply the format attribute
to improve how the above term is unparsed (lesson-09-c.k):

module LESSON-09-C
  imports BOOL

  syntax Stmt ::= "{" Stmt "}" [format(%1%i%n%2%d%n%3)] | "{" "}" [format(%1%2)]
                > right:
                  Stmt Stmt [format(%1%n%2)]
                | "if" "(" Bool ")" Stmt [format(%1 %2%3%4 %5)]
                | "if" "(" Bool ")" Stmt "else" Stmt [avoid, format(%1 %2%3%4 %5 %6 %7)]
endmodule

If we compile this new definition and unparse the same term, this is the
result we get:

if (true) {
  if (true) {}
  if (false) {}
  if (true) {
    if (false) {} else {}
  } else {
    if (false) {}
  }
}

This is the exact same text we started with! By adding the format attributes,
we were able to indent the body of code blocks, adjust the spacing of if
statements, and put each statement on a new line.

How exactly was this achieved? Well, each time the unparser reaches a term,
it looks at the format attribute of that term. That format attribute is a
mix of characters and format codes. Format codes begin with the %
character. Each character in the format attribute other than a format code is
appended verbatim to the output, and each format code is handled according to
its meaning, transformed (possibly recursively) into a string of text, and
spliced into the output at the position the format code appears in the format
string.

Provided for reference is a table with a complete list of all valid format
codes, followed by their meaning:

Format Code Meaning
n Insert '\n' followed by the current indentation level
i Increase the current indentation level by 1
d Decrease the current indentation level by 1
c Move to the next color in the list of colors for this production (see next section)
r Reset color to the default foreground color for the terminal (see next section)
an integer Print a terminal or non-terminal from the production. The integer is treated as a 1-based index into the terminals and non-terminals of the production.

If the offset refers to a terminal, move to the next color in the list of colors for this production, print the value of that terminal, then reset the color to the default foreground color for the terminal.

If the offset refers to a regular expression terminal, it is an error.

If the offset refers to a non-terminal, unparse the corresponding child of the current term (starting with the current indentation level) and print the resulting text, then set the current color and indentation level to the color and indentation level following unparsing that term.
other char Print that character verbatim

Exercise

Change the format attributes for LESSON-09-C so that if.stmt will unparse
as follows:

if (true)
{
  if (true)
  {
  }
  if (false)
  {
  }
  if (true)
  {
    if (false)
    {
    }
    else
    {
    }
  }
  else
  {
    if (false)
    {
    }
  }
}

Output coloring

When the output of unparsing is displayed on a terminal supporting colors, K
is capable of coloring the output, similar to what is possible with a syntax
highlighter. This is achieved via the color and colors attributes.

Essentially, both the color and colors attributes are used to construct a
list of colors associated with each production, and then the format attribute
is used to control how those colors are used to unparse the term. At its most
basic level, you can set the color attribute to color all the terminals in
the production a certain color, or you can use the colors attribute to
specify a comma-separated list of colors for each terminal in the production.
At a more advanced level, the %c and %r format codes control how the
formatter interacts with the list of colors specified by the colors
attribute. You can essentially think of the color attribute as a way of
specifying that you want all the colors in the list to be the same color.

Note that the %c and %r format codes are relatively primitive in nature.
The color and colors attributes merely maintain a list of colors, whereas
the %c and %r format codes merely control how to advance through that list
and how individual text is colored.

It is an error if the colors attribute does not provide all the colors needed
by the terminals and escape codes in the production. %r does not change the
position in the list of colors at all, so the next %c will advance to the
following color.

As a complete example, here is a variant of LESSON-09-A which colors the
various boolean operators:

module LESSON-09-D
  imports BOOL

  syntax Exp ::= "(" Exp ")" [bracket]
               | Bool
               > "!" Exp [color(yellow)]
               > left:
                 Exp "&&" Exp [color(red)]
               | Exp "^" Exp [color(blue)]
               | Exp "||" Exp [color(green)]

  syntax Exp ::= id(Exp) [function]
  rule id(E) => E
endmodule

For a complete list of allowed colors, see
here.

Exercises

  1. Use the color attribute on LESSON-09-C to color the keywords true and
    false one color, the keywords if and else another color, and the operators
    (, ), {, and } a third color.

  2. Use the format, color, and colors attributes to tell the unparser to
    style the expression grammar from Lesson 1.8, Exercise 3 according to your own
    personal preferences for syntax highlighting and code formatting. You can
    view the result of the unparser on a function term without evaluating that
    function by means of the command kparse <file> | kore-print -.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.10: Strings.

Lesson 1.10: Strings

The purpose of this lesson is to explain how to use the String sort in K to
represent sequences of characters, and explain where to find additional
information about builtin functions over strings.

The String Sort

In addition to the Int and Bool sorts covered in
Lesson 1.6, K provides, among others, the
String sort to represent sequences of characters. You can import this
functionality via the STRING-SYNTAX module, which contains the syntax of
string literals in K, and the STRING module, which contains all the functions
that operate over the String type.

Strings in K are double-quoted. The following list of escape sequences is
supported:

Escape Sequence Meaning
\" The literal character "
\\ The literal character \
\n The newline character (ASCII code 0x0a)
\r The carriage return character (ASCII code 0x0d)
\t The tab character (ASCII code 0x09)
\f The form feed character (ASCII code 0x0c)
\x00 \x followed by 2 hexadecimal digits indicates a code point between 0x00 and 0xFF
\u0000 \u followed by 4 hexadecimal digits indicates a code point between 0x0000 and 0xFFFF
\U00000000 \U followed by 8 hexadecimal digits indicates a code point between 0x000000 and 0x10FFFF

Please note that as of the current moment, K's unicode support is not fully
complete, so you may run into errors using code points greater than 0xff.

As an example, you can construct a string literal containing the following
block of text:

This is an example block of text.
Here is a quotation: "Hello world."
	This line is indented.
ÁÉÍÓÚ

Like so:

"This is an example block of text.\nHere is a quotation: \"Hello world.\"\n\tThis line is indented.\n\xc1\xc9\xcd\xd3\xda\n"

Basic String Functions

The full list of functions provided for the String sort can be found in
domains.md, but here we
describe a few of the more basic ones.

String concatenation

The concatenation operator for strings is +String. For example, consider
the following K rule that constructs a string from component parts
(lesson-10.k):

module LESSON-10
  imports STRING

  syntax String ::= msg(String) [function]
  rule msg(S) => "The string you provided: " +String S +String "\nHave a nice day!"
endmodule

Note that this operator is O(N), so repeated concatenations are inefficient.
For information about efficient string concatenation, refer to
Lesson 2.14.

String length

The function to return the length of a string is lengthString. For example,
lengthString("foo") will return 3, and lengthString("") will return 0.
The return value is the length of the string in code points.

Substring computation

The function to compute the substring of a string is substrString. It
takes two string indices, starting from 0, and returns the substring within the
range [start..end). It is only defined if end >= start, start >= 0, and
end <= length of string. Here, for example, we return the first 5 characters
of a string:

substrString(S, 0, 5)

Here we return all but the first 3 characters:

substrString(S, 3, lengthString(S))

Exercises

  1. Write a function that takes a paragraph of text (i.e., a sequence of
    sentences, each ending in a period), and constructs a new (nonsense) sentence
    composed of the first word of each sentence, followed by a period. Do not
    worry about capitalization or periods within the sentence which do not end the
    sentence (e.g. "Dr."). You can assume that all whitespace within the paragraph
    are spaces. For more information about the functions over strings required to
    implement such a function, refer to domains.md.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.11: Casting Terms.

Lesson 1.11: Casting Terms

The purpose of this lesson is to explain how to use cast expressions in
order to disambiguate terms using sort information. We also explain how the
variable sort inference algorithm works in K, and how to change the default
behavior by casting variables to a particular sort.

Casting in K

Sometimes the grammar you write for your rules in K can be a little bit
ambiguous on purpose. While grammars for programming languages may be
unambiguous when considered in their entirety, K allows you to write rules
involving arbitrary fragments of that grammar, and those fragments can
sometimes be ambiguous by themselves, or similar enough to other fragments
of the grammar to trigger ambiguity. As a result, in addition to the tools
covered in Lesson 1.4, K provides one
additional powerful tool for disambiguation: cast expressions.

K provides three main types of casts: the semantic cast, the strict cast, and
the projection cast. We will cover each of them, and their similarities and
differences, in turn.

Semantic casts

The most basic, and most common, type of cast in K is called the
semantic cast. For every sort S declared in a module, K provides the
following (implicit) production for use in sentences:

  syntax S ::= S ":S"

Note that S simply represents the name of the sort. For example, if we
defined a sort Exp, the actual production for that sort would be:

  syntax Exp ::= Exp ":Exp"

At runtime, this expression will not actually exist; it is merely an annotation
to the compiler describing the sort of the term inside the cast. It is telling
the compiler that the term inside the cast must be of sort Exp. For example,
if we had the following grammar:

module LESSON-11-A
  imports INT

  syntax Exp ::= Int | Exp "+" Exp
  syntax Stmt ::= "if" "(" Exp ")" Stmt | "{" "}"
endmodule

Then we would be able to write 1:Exp, or (1 + 2):Exp, but not {}:Exp.

You can also restrict the sort that a variable in a rule will match by casting
it. For example, consider the following additional module:

module LESSON-11-B
  imports LESSON-11-A
  imports BOOL

  syntax Term ::= Exp | Stmt
  syntax Bool ::= isExpression(Term) [function]

  rule isExpression(_E:Exp) => true
  rule isExpression(_) => false [owise]
endmodule

Here we have defined a very simple function that decides whether a term is
an expression or a statement. It does this by casting the variable inside the
isExpression rule to sort Exp. As a result, that variable will only match terms
of sort Exp. Thus, isExpression(1) will return true, as will isExpression(1 + 2), but
isExpression({}) will return false.

Exercise

Verify this fact for yourself by running isExpression on the above examples. Then
write an isStatement function, and test that it works as expected.

Strict casts

On occasion, a semantic cast is not strict enough. It might be that you want
to, for disambiguation purposes, say exactly what sort a term is. For
example, consider the following definition:

module LESSON-11-C
  imports INT

  syntax Exp ::= Int
               | "add[" Exp "," Exp "]"   [group(exp)]
  syntax Exp2 ::= Exp
               | "add[" Exp2 "," Exp2 "]" [group(exp2)]
endmodule

This grammar is a little ambiguous and contrived, but it serves to demonstrate
how a semantic cast might be insufficient to disambiguate a term. If we were
to write the term add[ I1:Int , I2:Int ]:Exp2, the term would be ambiguous,
because the cast is not sufficiently strict to determine whether you mean
to derive the "add" production defined in group exp or the one in group exp2.

In this situation, there is a solution: the strict cast. For every sort
S in your grammar, K also defines the following production:

  syntax S ::= S "::S"

This may at first glance seem the same as the previous cast. And indeed,
from the perspective of the grammar and from the perspective of rewriting,
they are in fact identical. However, the second variant has a unique meaning
in the type system of K: namely, the term inside the cast cannot be a
subsort, i.e., a term of another sort S2 such that the production
syntax S ::= S2 exists.

As a result, if we were to write in the above grammar the term
add[ I1:Int , I2:Int ]::Exp2, then we would know that the second derivation above
should be chosen, whereas if we want the first derivation, we could write
add[ I1:Int , I2:Int ]::Exp.

Care must be taken when using a strict cast with brackets. For example, consider a
similar grammar but using an infix "+":

module LESSON-11-D
  imports INT

  syntax Exp ::= Int
               | Exp "+" Exp   [group(exp)]
  syntax Exp2 ::= Exp
               | Exp2 "+" Exp2 [group(exp2)]
               | "(" Exp2 ")"  [bracket]
endmodule

The term I1:Int + I2:Int is ambiguous and could refer to either the production
in group exp or the one in group exp2. To differentiate, you might try to write
(I1:Int + I2:Int)::Exp2 similarly to the previous example.

Unfortunately though, this is still ambiguous. Here, the strict cast ::Exp2 applies
directly to the brackets themselves rather than the underlying term within those brackets.
As a result, it enforces that (I1:Int + I2:Int) cannot be a strict subsort of Exp2, but
it has no effect on the sort of the subterm I1:Int + I2:Int.

For cases like this, K provides an alternative syntax for strict casts:

  syntax S ::= "{" S "}::S"

The ambiguity can then be resolved with {I1:Int + I2:Int}::Exp or {I1:Int + I2:Int}::Exp2.

Projection casts

Thus far we have focused entirely on casts which exist solely to inform the
compiler about the sort of terms. However, sometimes when dealing with grammars
containing subsorts, it can be desirable to reason with the subsort production
itself, which injects one sort into another. Remember from above that such
a production looks like syntax S ::= S2. This type of production, called a
subsort production, can be thought of as a type of inheritance involving
constructors. If we have the above production in our grammar, we say that S2
is a subsort of S, or that any S2 is also an S. K implicitly maintains a
symbol at runtime which keeps track of where such subsortings occur; this
symbol is called an injection.

Sometimes, when one sort is a subsort of another, it can be the case that
a function returns one sort, but you actually want to cast the result of
calling that function to another sort which is a subsort of the first sort.
This is similar to what happens with inheritance in an object-oriented
language, where you might cast a superclass to a subclass if you know for
sure the object at runtime is in fact an instance of that class.

K provides something similar for subsorts: the projection cast.

For each pair of sorts S and S2, K provides the following production:

  syntax S ::= "{" S2 "}" ":>S"

What this means is that you take any term of sort S2 and cast it to sort
S. If the term of sort S2 consists of an injection containing a term of sort
S, then this will return that term. Otherwise, an error occurs and rewriting
fails, returning the projection function which failed to apply. The sort is
not actually checked at compilation time; rather, it is a runtime check
inserted into the code that runs when the rule applies.

For example, here is a module that makes use of projection casts:

module LESSON-11-E
  imports INT
  imports BOOL

  syntax Exp ::= Int | Bool | Exp "+" Exp | Exp "&&" Exp

  syntax Exp ::= eval(Exp) [function]
  rule eval(I:Int) => I
  rule eval(B:Bool) => B
  rule eval(E1 + E2) => {eval(E1)}:>Int +Int {eval(E2)}:>Int
  rule eval(E1 && E2) => {eval(E1)}:>Bool andBool {eval(E2)}:>Bool
endmodule

Here we have defined constructors for a simple expression language over
Booleans and integers, as well as a function eval that evaluates these
expressions to a value. Because that value could be an integer or a Boolean,
we need the casts in the last two rules in order to meet the type signature of
+Int and andBool. Of course, the user can write ill-formed expressions like
1 && true or false + true, but these will cause errors at runtime, because
the projection cast will fail.

Exercises

  1. Extend the eval function in LESSON-11-E to include Strings and add a .
    operator which concatenates them.

  2. Modify your solution from Lesson 1.9, Exercise 2 by using an Exp sort to
    express the integer and Boolean expressions that it supports, in the same style
    as LESSON-11-E. Then write an eval function that evaluates all terms of
    sort Exp to either a Bool or an Int.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.12: Syntactic Lists.

Lesson 1.12: Syntactic Lists

The purpose of this lesson is to explain how K provides support for syntactic
repetition through the use of the List{} and NeList{} constructs,
generally called syntactic lists.

The List{} construct

Sometimes, when defining a grammar in K, it is useful to define a syntactic
construct consisting of an arbitrary-length sequence of items. For example,
you might wish to define a function call construct, and need to express a way
of passing arguments to the function. You can in theory simply define these
productions using ordinary constructors, but it can be tricky to get the syntax
exactly right in K without a lot of tedious glue code.

For this reason, K provides a way of specifying that a non-terminal represents
a syntactic list (lesson-12-a.k):

module LESSON-12-A-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= List{Int,","}
endmodule

module LESSON-12-A
  imports LESSON-12-A-SYNTAX
endmodule

Note that instead of a sequence of terminals and non-terminals, the right hand
side of the Ints production contains the symbol List followed by two items
in curly braces. The first item is the non-terminal which is the element type
of the list, and the second item is a terminal representing the separator of
the list. As a special case, lists which are separated only by whitespace can
be specified with a separator of "".

This List{} construct is roughly equivalent to the following definition
(lesson-12-b.k):

module LESSON-12-B-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= Int "," Ints | ".Ints"
endmodule

module LESSON-12-B
  imports LESSON-12-B-SYNTAX
endmodule

As you can see, the List{} construct represents a cons-list with an element
at the head and another list at the tail. The empty list is represented by
a . followed by the sort of the list.

However, the List{} construct provides several key syntactic conveniences
over the above definition. First of all, when writing a list in a rule,
explicitly writing the terminator is not always required. For example, consider
the following additional module (lesson-12-c.k):

module LESSON-12-C
  imports LESSON-12-A
  imports INT

  syntax Int ::= sum(Ints) [function]
  rule sum(I:Int) => I
  rule sum(I1:Int, I2:Int, Is:Ints) => sum(I1 +Int I2, Is)
endmodule

Here we see a function that sums together a non-empty list of integers. Note in
particular the first rule. We do not explicitly mention .Ints, but in fact,
the rule in question is equivalent to the following rule:

  rule sum(I:Int, .Ints) => I

The reason for this is that K will automatically insert a list terminator
anywhere a syntactic list is expected, but an element of that list appears
instead. This works even with lists of more than one element:

  rule sum(I1:Int, I2:Int) => I1 +Int I2

This rule is redundant, but here we explicitly match a list of exactly two
elements, because the .Ints is implicitly added after I2.

Parsing Syntactic Lists in Programs

An additional syntactic convenience takes place when you want to express a
syntactic list in the input to krun. In this case, K will automatically
transform the grammar in LESSON-12-B-SYNTAX into the following
(lesson-12-d.k):

module LESSON-12-D
  imports INT-SYNTAX

  syntax Ints ::= #NonEmptyInts | #IntsTerminator
  syntax #NonEmptyInts ::= Int "," #NonEmptyInts
                         | Int #IntsTerminator
  syntax #IntsTerminator ::= ""
endmodule

This allows you to express the usual comma-separated list of arguments where
an empty list is represented by the empty string, and you don't have to
explicitly terminate the list. Because of this, we can write the syntax
of function calls in C very easily (lesson-12-e.k):

module LESSON-12-E
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id | Exp "(" Exps ")"
  syntax Exps ::= List{Exp,","}
endmodule

Exercise

Write a function concat which takes a list of String and concatenates them
all together. Do not worry if the function is O(n^2).
Test your implementation using the syntactic sugar for lists added by the parser.

Then write some function call expressions using identifiers in C and verify with
kast that the above grammar captures the intended syntax. Make sure to test
with function calls with zero, one, and two or more arguments.

The NeList{} construct

One limitation of the List{} construct is that it is always possible to
write a list of zero elements where a List{} is expected. While this is
desirable in a number of cases, it is sometimes not what the grammar expects.

For example, in C, it is not allowable for an enum definition to have zero
members. In other words, if we were to write the grammar for enumerations like
so (lesson-12-f.k):

module LESSON-12-F
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id

  syntax EnumSpecifier ::= "enum" Id "{" Ids "}"
  syntax Ids ::= List{Id,","}
endmodule

Then we would be syntactically allowed to write enum X {}, which instead,
ought to be a syntax error.

For this reason, we introduce the additional NeList{} construct. The syntax
is identical to List{}, except with NeList instead of List before the
curly braces. When parsing rules, it behaves identically to the List{}
construct. However, when parsing inputs to krun, the above grammar, if we
replaced syntax Ids ::= List{Id,","} with syntax Ids ::= NeList{Id,","},
would become equivalent to the following (lesson-12-g.k):

module LESSON-12-G
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id

  syntax EnumSpecifier ::= "enum" Id "{" Ids "}"
  syntax Ids ::= Id | Id "," Ids
endmodule

In other words, only non-empty lists of Id would be allowed.

Exercises

  1. Modify the sum function in LESSON-12-C so that the Ints sort is an
    NeList{}. Verify that calling sum() with no arguments is now a syntax
    error.

  2. Write a modified sum function with the List construct that can also sum
    up an empty list of arguments. In such a case, the sum ought to be 0.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.13: Basics of K Rewriting.

Lesson 1.13: Basics of K Rewriting

The purpose of this lesson is to explain how rewrite rules that are not the
definition of a function behave, and how, using these rules, you can construct
a semantics of programs in a programming language in K.

Recap: Function rules in K

Recall from Lesson 1.2 that we have, thus far,
introduced two types of productions in K: constructors and functions.
A function is identified by the function attribute placed on the
production. As you may recall, when we write a rule with a function on the
left-hand side of the => operator, we are defining the meaning of that
function for inputs which match the patterns on the left-hand side of the rule.
If the argument to the function match the patterns, then the function is
evaluated to the value constructed by substituting the bindings for the
variables into the right-hand side of the rule.

Top-level rules

However, function rules are not the only type of rule permissible in K, nor
even the most frequently used. K also has a concept of a
top-level rewrite rule. The simplest way to ensure that a rule is treated
as a top-level rule is for the left-hand side of the rule to mention one or
more cells. We will cover how cells work and are declared in more detail
in a later lesson, but for now, what you should know is that when we ran krun
in our very first example in Lesson 1.2 and got the following output:

<k>
  Yellow ( ) ~> .
</k>

<k> is a cell, known by convention as the K cell. This cell is available
by default in any definition without needing to be explicitly declared.

The K cell contains a single term of sort K. K is a predefined sort in K
with two constructors, that can be roughly represented by the following
grammar:

  syntax K ::= KItem "~>" K
             | "."

As a syntactic convenience, K allows you to treat ~> like it is an
associative list (i.e., as if it were defined as syntax K ::= K "~>" K).
When a definition is compiled, it will automatically transform the rules you
write so that they treat the K sort as a cons-list. Another syntactic
convenience is that, for disambiguation purposes, you can write .K anywhere
you would otherwise write . and the meaning is identical.

Now, you may notice that the above grammar mentions the sort KItem. This is
another built-in sort in K. For every sort S declared in a definition (with
the exception of K and KItem), K will implicitly insert the following
production:

  syntax KItem ::= S

In other words, every sort is a subsort of the sort KItem, and thus a term
of any sort can be injected as an element of a term of sort K, also called
a K sequence.

By default, when you krun a program, the AST of the program is inserted as
the sole element of a K sequence into the <k> cell. This explains why we
saw the output we did in Lesson 1.2.

With these preliminaries in mind, we can now explain how top-level rewrite
rules work in K. Put simply, any rule where there is a cell (such as the K
cell) at the top on the left-hand side will be a top-level rewrite rule. Once
the initial program has been inserted into the K cell, the resulting term,
called the configuration, will be matched against all the top-level
rewrite rules in the definition. If only one rule matches, the substitution
generated by the matching will be applied to the right-hand side of the rule
and the resulting term is rewritten to be the new configuration. Rewriting
proceeds by iteratively applying rules, also called taking steps, until
no top-level rewrite rule can be applied. At this point the configuration
becomes the final configuration and is output by krun.

If more than one top-level rule applies, by default, K will pick just one
of those rules, apply it, and continue rewriting. However, it is
non-deterministic which rule applies. In theory, it could be any of them.
By passing the --search flag to krun, you are able to tell krun to
explore all possible non-deterministic choices, and generate a complete list of
all possible final configurations reachable by each nondeterminstic choice that
can be made. Note that the --search flag to krun only works if you pass
--enable-search to kompile first.

Unlike top-level rewrite rules, function rules are not associated with any
particular set of cells in the configuration (although they can contain cells
in their function arguments and return value). While top-level rewrite rules
apply to the entire term being rewritten, function rules apply anywhere a
function application for that function appears, and are immediately rewritten
to their return value in that position.

Another key distinction between top-level rules and function rules is that
function symbols, i.e., productions with the function attribute, are
mathematical functions rather than constructors. While a constructor is
logically distinct from any other constructor of the same sort, and can be
matched against unconditionally, a function does not necessaraily have the
same restriction unless it happens to be an injective function. Thus, two
function symbols with different arguments may still ultimately produce the
same value and thus compare equal to one another. Due to this, concrete
execution (i.e., all K definitions introduced thus far; see Lesson 1.21)
introduces the restriction that you cannot match on a function symbol on the
left-hand side of a rule, except as the top symbol on the left-hand side of
a function rule. This restriction will be later lifted when we introduce the
Haskell Backend which performs symbolic execution.

Exercise

Pass a program containing no functions to krun. You can use a term of sort
Exp from LESSON-11-E. Observe the output and try to understand why you get
the output you do. Then write two rules that rewrite that program to another.
Run krun --search on that program and observe both results. Then add a third
rule that rewrites one of those results again. Test that that rule applies as
well.

Using top-level rules to evaluate expressions

Thus far, we have focused primarily on defining functions over constructors
in K. However, now that we have a basic understanding of top-level rules,
it is possible to introduce a rewrite system to our definitions. A rewrite
system is a collection of top-level rewrite rules which performs an organized
transformation of a particular program into a result which expresses the
meaning of that program. For example, we might rewrite an expression in a
programming language into a value representing the result of evaluating that
expression.

Recall in Lesson 1.11, we wrote a simple grammar of Boolean and integer
expressions that looked roughly like this (lesson-13-a.k):

module LESSON-13-A
  imports INT

  syntax Exp ::= Int
               | Bool
               | Exp "+" Exp
               | Exp "&&" Exp
endmodule

In that lesson, we defined a function eval which evaluated such expressions
to either an integer or Boolean.

However, it is more idiomatic to evaluate such expressions using top-level
rewrite rules. Here is how one might do so in K (lesson-13-b.k):

module LESSON-13-B-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Val ::= Int | Bool
  syntax Exp ::= Val
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-13-B
  imports LESSON-13-B-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k>
  rule <k> B1:Bool && B2:Bool ~> K:K </k> => <k> B1 andBool B2 ~> K </k>

  syntax KItem ::= freezer1(Val) | freezer2(Exp)
                 | freezer3(Val) | freezer4(Exp)

  rule <k> E1:Val + E2:Exp ~> K:K </k> => <k> E2 ~> freezer1(E1) ~> K </k> [priority(51)]
  rule <k> E1:Exp + E2:Exp ~> K:K </k> => <k> E1 ~> freezer2(E2) ~> K </k> [priority(52)]
  rule <k> E1:Val && E2:Exp ~> K:K </k> => <k> E2 ~> freezer3(E1) ~> K </k> [priority(51)]
  rule <k> E1:Exp && E2:Exp ~> K:K </k> => <k> E1 ~> freezer4(E2) ~> K </k> [priority(52)]

  rule <k> E2:Val ~> freezer1(E1) ~> K:K </k> => <k> E1 + E2 ~> K </k>
  rule <k> E1:Val ~> freezer2(E2) ~> K:K </k> => <k> E1 + E2 ~> K </k>
  rule <k> E2:Val ~> freezer3(E1) ~> K:K </k> => <k> E1 && E2 ~> K </k>
  rule <k> E1:Val ~> freezer4(E2) ~> K:K </k> => <k> E1 && E2 ~> K </k>
endmodule

This is of course rather cumbersome currently, but we will soon introduce
syntactic convenience which makes writing definitions of this type considerably
easier. For now, notice that there are roughly 3 types of rules here: the first
matches a K cell in which the first element of the K sequence is an Exp whose
arguments are values, and rewrites the first element of the sequence to the
result of that expression. The second also matches a K cell with an Exp in
the first element of its K sequence, but it matches when one or both arguments
of the Exp are not values, and replaces the first element of the K sequence
with two new elements: one being an argument to evaluate, and the other being
a special constructor called a freezer. Finally, the third matches a K
sequence where a Val is first, and a freezer is second, and replaces them
with a partially evaluated expression.

This general pattern is what is known as heating an expression,
evaluating its arguments, cooling the arguments into the expression
again, and evaluating the expression itself. By repeatedly performing
this sequence of actions, we can evaluate an entire AST containing a complex
expression down into its resulting value.

Exercise

Write an addition expression with integers. Use krun --depth 1 to see the
result of rewriting after applying a single top-level rule. Gradually increase
the value of --depth to see successive states. Observe how this combination
of rules is eventually able to evaluate the entire expression.

Simplifying the evaluator: Local rewrites and cell ellipses

As you saw above, the definition we wrote is rather cumbersome. Over the
remainder of Lessons 1.13 and 1.14, we will greatly simplify it. The first step
in doing so is to teach a bit more about the rewrite operator, =>. Thus far,
all the rules we have written look like rule LHS => RHS. However, this is not
the only way the rewrite operator can be used. It is actually possible to place
a constructor or function at the very top of the rule, and place rewrite
operators inside that term. While a rewrite operator cannot appear nested
inside another rewrite operator, by doing this, we can express that some parts
of what we are matching are not changed by the rewrite operator. For
example, consider the following rule from above:

  rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k>

We can equivalently write it like following:

  rule <k> (I1:Int + I2:Int => I1 +Int I2) ~> _:K </k>

When you put a rewrite inside a term like this, in essence, you are telling
the rule to only rewrite part of the left-hand side to the right-hand side.
In practice, this is implemented by lifting the rewrite operator to the top of
the rule by means of duplicating the surrounding context.

There is a way that the above rule can be simplified further, however. K
provides a special syntax for each cell containing a term of sort K, indicating
that we want to match only on some prefix of the K sequence. For example, the
above rule can be simplified further like so:

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>

Here we have placed the symbol ... immediately prior to the </k> which ends
the cell. What this tells the compiler is to take the contents of the cell,
treat it as the prefix of a K sequence, and insert an anonymous variable of
sort K at the end. Thus we can think of ... as a way of saying we
don't care about the part of the K sequence after the beginning, leaving
it unchanged.

Putting all this together, we can rewrite LESSON-13-B like so
(lesson-13-c.k):

module LESSON-13-C-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Val ::= Int | Bool
  syntax Exp ::= Val
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-13-C
  imports LESSON-13-C-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax KItem ::= freezer1(Val) | freezer2(Exp)
                 | freezer3(Val) | freezer4(Exp)

  rule <k> E1:Val + E2:Exp => E2 ~> freezer1(E1) ...</k> [priority(51)]
  rule <k> E1:Exp + E2:Exp => E1 ~> freezer2(E2) ...</k> [priority(52)]
  rule <k> E1:Val && E2:Exp => E2 ~> freezer3(E1) ...</k> [priority(51)]
  rule <k> E1:Exp && E2:Exp => E1 ~> freezer4(E2) ...</k> [priority(52)]

  rule <k> E2:Val ~> freezer1(E1) => E1 + E2 ...</k>
  rule <k> E1:Val ~> freezer2(E2) => E1 + E2 ...</k>
  rule <k> E2:Val ~> freezer3(E1) => E1 && E2 ...</k>
  rule <k> E1:Val ~> freezer4(E2) => E1 && E2 ...</k>
endmodule

This is still rather cumbersome, but it is already greatly simplified. In the
next lesson, we will see how additional features of K can be used to specify
heating and cooling rules much more compactly.

Exercises

  1. Modify LESSON-13-C to add rules to evaluate integer subtraction.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.14: Defining Evaluation Order.

Lesson 1.14: Defining Evaluation Order

The purpose of this lesson is to explain how to use the heat and cool
attributes, context and context alias sentences, and the strict and
seqstrict attributes to more compactly express heating and cooling in K,
and to express more advanced evaluation strategies in K.

The heat and cool attributes

Thus far, we have been using rule priority and casts to express when to heat
an expression and when to cool it. For example, the rules for heating have
lower priority, so they do not apply if the term could be evaluated instead,
and the rules for heating are expressly written only to apply if the argument
of the expression is a value.

However, K has built-in support for deciding when to heat and when to cool.
This support comes in the form of the rule attributes heat and cool as
well as the specially named function isKResult.

Consider the following definition, which is equivalent to LESSON-13-C
(lesson-14-a.k):

module LESSON-14-A-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-14-A
  imports LESSON-14-A-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax KItem ::= freezer1(Exp) | freezer2(Exp)
                 | freezer3(Exp) | freezer4(Exp)

  rule <k> E:Exp + HOLE:Exp => HOLE ~> freezer1(E) ...</k>
    requires isKResult(E) [heat]
  rule <k> HOLE:Exp + E:Exp => HOLE ~> freezer2(E) ...</k> [heat]
  rule <k> E:Exp && HOLE:Exp => HOLE ~> freezer3(E) ...</k>
    requires isKResult(E) [heat]
  rule <k> HOLE:Exp && E:Exp => HOLE ~> freezer4(E) ...</k> [heat]

  rule <k> HOLE:Exp ~> freezer1(E) => E + HOLE ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer2(E) => HOLE + E ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer3(E) => E && HOLE ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer4(E) => HOLE && E ...</k> [cool]

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

We have introduced three major changes to this definition. First, we have
removed the Val sort. We replace it instead with a function isKResult.
The function in question must have the same signature and attributes as seen in
this example. It ought to return true whenever a term should not be heated
(because it is a value) and false when it should be heated (because it is not
a value). We thus also insert isKResult calls in the side condition of two
of the heating rules, where the Val sort was previously used.

Second, we have removed the rule priorities on the heating rules and the use of
the Val sort on the cooling rules, and replaced them with the heat and
cool attributes. These attributes instruct the compiler that these rules are
heating and cooling rules, and thus should implicitly apply only when certain
terms on the LHS either are or are not a KResult (i.e., isKResult returns
true versus false).

Third, we have renamed some of the variables in the heating and cooling rules
to the special variable HOLE. Syntactically, HOLE is just a special name
for a variable, but it is treated specially by the compiler. By naming a
variable HOLE, we have informed the compiler which term is being heated
or cooled. The compiler will automatically insert the side condition
requires isKResult(HOLE) to cooling rules and the side condition
requires notBool isKResult(HOLE) to heating rules.

Exercise

Modify LESSON-14-A to add rules to evaluate integer subtraction.

Simplifying further with Contexts

The above example is still rather cumbersome to write. We must explicitly write
both the heating and the cooling rule separately, even though they are
essentially inverses of one another. It would be nice to instead simply
indicate which terms should be heated and cooled, and what part of them to
operate on.

To do this, K introduces a new type of sentence, the context. Contexts
begin with the context keyword instead of the rule keyword, and usually
do not contain a rewrite operator.

Consider the following definition which is equivalent to LESSON-14-A
(lesson-14-b.k):

module LESSON-14-B-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-14-B
  imports LESSON-14-B-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  context <k> E:Exp + HOLE:Exp ...</k>
    requires isKResult(E)
  context <k> HOLE:Exp + _:Exp ...</k>
  context <k> E:Exp && HOLE:Exp ...</k>
    requires isKResult(E)
  context <k> HOLE:Exp && _:Exp ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

In this example, the heat and cool rules have been removed entirely, as
have been the productions defining the freezers. Don't worry, they still exist
under the hood; the compiler is just generating them automatically. For each
context sentence like above, the compiler generates a #freezer production,
a heat rule, and a cool rule. The generated form is equivalent to the
rules we wrote manually in LESSON-14-A. However, we are now starting to
considerably simplify the definition. Instead of 3 sentences, we just have one.

context alias sentences and the strict and seqstrict attributes

Notice that the contexts we included in LESSON-14-B still seem rather
similar in form. For each expression we want to evaluate, we are declaring
one context for each operand of that expression, and they are each rather
similar to one another. We would like to be able to simplify further by
simply annotating each expression production with information about how
it is to be evaluated instead. We can do this with the seqstrict attribute.

Consider the following definition, once again equivalent to those above
(lesson-14-c.k):

module LESSON-14-C-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp [seqstrict(exp; 1, 2)]
               > left: Exp "&&" Exp [seqstrict(exp; 1, 2)]
endmodule

module LESSON-14-C
  imports LESSON-14-C-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  context alias [exp]: <k> HERE ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

This definition has two important changes from the one above. The first is
that the individual context sentences have been removed and have been
replaced with a single context alias sentence. You may notice that this
sentence begins with an identifier in square brackets followed by a colon. This
syntax is a way of naming individual sentences in K for reference by the tool
or by other sentences. The context alias sentence also has a special variable
HERE.

The second is that the productions in LESSON-14-C-SYNTAX have been given a
seqstrict attribute. The value of this attribute has two parts. The first
is the name of a context alias sentence. The second is a comma-separated list
of integers. Each integer represents an index of a non-terminal in the
production, counting from 1. For each integer present, the compiler implicitly
generates a new context sentence according to the following rules:

  1. The compiler starts by looking for the context alias sentence named. If
    there is more than one, then one context sentence is created per
    context alias sentence with that name.
  2. For each context created, the variable HERE in the context alias is
    substituted with an instance of the production the seqstrict attribute is
    attached to. Each child of that production is a variable. The non-terminal
    indicated by the integer offset of the seqstrict attribute is given the name
    HOLE.
  3. For each integer offset prior in the list to the one currently being
    processed, the predicate isKResult(E) is conjuncted together and included
    as a side condition, where E is the child of the production term with that
    offset, starting from 1. For example, if the attribute lists 1, 2, then
    the rule generated for the 2 will include isKResult(E1) where E1 is the
    first child of the production.

As you can see if you work through the process, the above code will ultimately
generate the same contexts present in LESSON-14-B.

Finally, note that there are a few minor syntactic conveniences provided by the
seqstrict attribute. First, in the special case of the context alias sentence
being <k> HERE ...</k>, you can omit both the context alias sentence
and the name from the seqstrict attribute.

Second, if the numbered list of offsets contains every non-terminal in the
production, it can be omitted from the attribute value.

Thus, we can finally produce the idiomatic K definition for this example
(lesson-14-d.k):

module LESSON-14-D-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp [seqstrict]
               > left: Exp "&&" Exp [seqstrict]
endmodule

module LESSON-14-D
  imports LESSON-14-D-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

Exercise

Modify LESSON-14-D to add a production and rule to evaluate integer
subtraction.

Nondeterministic evaluation order with the strict attribute

Thus far, we have focused entirely on deterministic evaluation order. However,
not all languages are deterministic in the order they evaluate expressions.
For example, in C, the expression a() + b() + c() is guaranteed to parse
to (a() + b()) + c(), but it is not guaranteed that a will be called before
b before c. In fact, this evaluation order is non-deterministic.

We can express non-deterministic evaluation orders with the strict attribute.
Its behavior is identical to the seqstrict attribute, except that step 3 in
the above list (with the side condition automatically added) does not take
place. In other words, if we wrote syntax Exp ::= Exp "+" Exp [strict]
instead of syntax Exp ::= Exp "+" Exp [seqstrict], it would generate the
following two contexts instead of the ones found in LESSON-14-B:

  context <k> _:Exp + HOLE:Exp ...</k>
  context <k> HOLE:Exp + _:Exp ...</k>

As you can see, these contexts will generate heating rules that can both
apply to the same term. As a result, the choice of which heating rule
applies first is non-deterministic, and as we saw in Lesson 1.13, we can
get all possible behaviors by passing --search to krun.

Exercises

  1. Add integer division to LESSON-14-D. Make division and addition strict
    instead of seqstrict, and write a rule evaluating integer division with a
    side condition that the denominator is non-zero. Run krun --search on the
    program 1 / 0 + 2 / 1 and observe all possible outputs of the program. How
    many are there total, and why?

  2. Rework your solution from Lesson 1.9, Exercise 2 to evaluate expressions from left to right using the seqstrict attribute.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.15: Configuration Declarations and Cell Nesting.

Lesson 1.15: Configuration Declarations and Cell Nesting

The purpose of this lesson is to explain how to store additional information
about the state of your interpreter by declaring cells using the
configuration sentence, as well as how to add additional inputs to your
definition.

Cells and Configuration Declarations

We have already covered the absolute basics of cells in K by looking at the
<k> cell. As explained in Lesson 1.13, the
<k> cell is available without being explicitly declared. It turns out this is
because, if the user does not explicitly specify a configuration sentence
anywhere in the main module of their definition, the configuration sentence
from the DEFAULT-CONFIGURATION module of
kast.md is imported
automatically. Here is what that sentence looks like:

  configuration <k> $PGM:K </k>

This configuration declaration declares a single cell, the <k> cell. It also
declares that at the start of rewriting, the contents of that cell should be
initialized with the value of the $PGM configuration variable.
Configuration variables function as inputs to krun. These terms are supplied
to krun in the form of ASTs parsed using a particular module. By default, the
$PGM configuration variable uses the main syntax module of the definition.

The cast on the configuration variable also specifies the sort that is used as
the entry point to the parser, in this case the K sort. It is often
useful to cast to other sorts there as well for better control over the accepted
language. The sort used for the $PGM variable is referred to as the start
symbol. During parsing, the default start symbol K subsumes all user-defined
sorts except for syntactic lists. These are excluded because they will always
produce an ambiguity error when parsing a single element.

Note that we did not explicitly specify the $PGM configuration variable when
we invoked krun on a file. This is because krun handles the $PGM variable
specially, and allows you to pass the term for that variable via a file passed
as a positional argument to krun. We did, however, specify the PGM name
explicitly when we called krun with the -cPGM command line argument in
Lesson 1.2. This is the other, explicit, way of
specifying an input to krun.

This explains the most basic use of configuration declarations in K. We can,
however, declare multiple cells and multiple configuration variables. We can
also specify the initial values of cells statically, rather than dynamically
via krun.

For example, consider the following definition (lesson-15-a.k):

module LESSON-15-A-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= List{Int,","}
endmodule

module LESSON-15-A
  imports LESSON-15-A-SYNTAX
  imports INT

  configuration <k> $PGM:Ints </k>
                <sum> 0 </sum>

  rule <k> I:Int, Is:Ints => Is ...</k>
       <sum> SUM:Int => SUM +Int I </sum>
endmodule

This simple definition takes a list of integers as input and sums them
together. Here we have declared two cells: <k> and <sum>. Unlike <k>,
<sum> does not get initialized via a configuration variable, but instead
is initialized statically with the value 0.

Note the rule in the second module: we have explicitly specified multiple
cells in a single rule. K will expect each of these cells to match in order for
the rule to apply.

Here is a second example (lesson-15-b.k):

module LESSON-15-B-SYNTAX
  imports INT-SYNTAX
endmodule

module LESSON-15-B
  imports LESSON-15-B-SYNTAX
  imports INT
  imports BOOL

  configuration <k> . </k>
                <first> $FIRST:Int </first>
                <second> $SECOND:Int </second>

  rule <k> . => FIRST >Int SECOND </k>
       <first> FIRST </first>
       <second> SECOND </second>
endmodule

This definition takes two integers as command-line arguments and populates the
<k> cell with a Boolean indicating whether the first integer is greater than
the second. Notice that we have specified no $PGM configuration variable
here. As a result, we cannot invoke krun via the syntax krun $file.
Instead, we must explicitly pass values for each configuration variable via the
-cFIRST and -cSECOND command line flags. For example, if we invoke
krun -cFIRST=0 -cSECOND=1, we will get the value false in the K cell.

You can also specify both a $PGM configuration variable and other
configuration variables in a single configuration declaration, in which case
you would be able to initialize $PGM with either a positional argument or the
-cPGM command line flag, but the other configuration variables would need
to be explicitly initialized with -c.

Exercise

Modify your solution to Lesson 1.14, Exercise 2 to add a new cell with a
configuration variable of sort Bool. This variable should determine whether
the / operator is evaluated using /Int or divInt. Test that by specifying
different values for this variable, you can change the behavior of rounding on
division of negative numbers.

Cell Nesting

It is possible to nest cells inside one another. A cell that contains other
cells must contain only other cells, but in doing this, you are able to
create a hierarchical structure to the configuration. Consider the following
definition (lesson-15-c.k), which is equivalent to the one in LESSON-15-B:

module LESSON-15-C-SYNTAX
  imports INT-SYNTAX
endmodule

module LESSON-15-C
  imports LESSON-15-C-SYNTAX
  imports INT
  imports BOOL

  configuration <T>
                  <k> . </k>
                  <state>
                    <first> $FIRST:Int </first>
                    <second> $SECOND:Int </second>
                  </state>
                </T>

  rule <k> . => FIRST >Int SECOND </k>
       <first> FIRST </first>
       <second> SECOND </second>
endmodule

Note that we have added some new cells to the configuration declaration:
the <T> cell wraps the entire configuration, and the <state> cell is
introduced around the <first> and <second> cells.

However, we have not changed the rule in this definition. This is because of
a concept in K called configuration abstraction. K allows you to specify
any number of cells in a rule (except zero) in any order you want, and K will
compile the rules into a form that matches the structure of the configuration
specified by the configuration declaration.

Here then, is how this rule would look after the configuration abstraction
has been resolved:

  rule <T>
         <k> . => FIRST >Int SECOND </k>
         <state>
           <first> FIRST </first>
           <second> SECOND </second>
         </state>
       </T>

In other words, K will complete cells to the top of the configuration by
inserting parent cells where appropriate based on the declared structure of
the configuration. This is useful because as a definition evolves, the
configuration may change, but you don't want to have to modify every single
rule each time. Thus, K follows the principle that you should only mention the
cells in a rule that are actually needed in order to accomplish its specific
goal. By following this best practice, you can significantly increase the
modularity of the definition and make it easier to maintain and modify.

Note that unlike top-level rewrite rules, cells that appear inside function
rules are not necessarily completed to the top of the configuration. They still
participate in cell ccompletion in the sense that you can mention cell
structure loosely inside a function rule and it will be completed into the
correct cell structure specified by the configuration declaration. However,
they do not complete all the way to the top, instead completing only up to
the top-most cell mentioned in the rule.

For example, if I write the following function rule in the above definition:

  rule doStuff(<first> FIRST </first>) => FIRST

The function will only match on the first cell, rather than the entire
configuration. However, if we had mentioned a parent cell in the rule, it still
would have completed the children of that parent cell as needed to ensure that
the resulting term is well formed.

Exercise

Modify your definition from the previous exercise in this lesson to wrap the
two cells you have declared in a top cell <T>. You should not have to change
any other rules in the definition.

Cell Variables

Sometimes it is desirable to explicitly match a variable against certain
fragments of the configuration. Because K's configuration is hierarchical,
we can grab subsets of the configuration as if they were just another term.
However, configuration abstraction applies here as well.
In particular, for each cell you specify in a configuration declaration, a
unique sort is assigned for that cell with a single constructor (the cell
itself). The sort name is taken by removing all special characters,
capitalizing the first letter and each letter after a hyphen, and adding the
word Cell at the end. For example, in the above example, the cell sorts are
TCell, KCell, StateCell, FirstCell, and SecondCell. If we had declared
a cell as <first-number>, then the cell sort name would be FirstNumberCell.

You can explicitly reference a variable of one of these sorts anywhere you
might instead write that cell. For example, consider the following rule:

  rule <k> true => S </k>
       (S:StateCell => <state>... .Bag ...</state>)

Here we have introduced two new concepts. The first is the variable of sort
StateCell, which matches the entire <state> part of the configuration. The
second is that we have introduced the concept of ... once again. When a cell
contains other cells, it is also possible to specify ... on either the left,
right or both sides of the cell term. Each of these three syntaxes are
equivalent in this case. When they appear on the left-hand side of a rule, they
indicate that we don't care what value any cells not explicitly named might
have. For example, we might write <state>... <first> 0 </first> ...</state> on
the left-hand side of a rule in order to indicate that we want to match the
rule when the <first> cell contains a zero, regardless of what the <second>
cell contains. If we had not included this ellipsis, it would have been a
syntax error, because K would have expected you to provide a value for each of
the child cells.

However, if, as in the example above, the ... appeared on the right-hand side
of a rule, this instead indicates that the cells not explicitly mentioned under
the cell should be initialized with their default value from the configuration
declaration. In other words, that rule will set the value of <first> and
<second> to zero.

You may note the presence of the phrase .Bag here. You can think of this as
the empty set of cells. It is used as the child of a cell when you want to
indicate that no cells should be explicitly named. We will cover other uses
of this term in later lessons.

Exercises

  1. Modify the definition from the previous exercise in this lesson so that the
    Boolean cell you created is initialized to false. Then add a production
    syntax Stmt ::= Bool ";" Exp, and a rule that uses this Stmt to set the
    value of the Boolean flag. Then add another production
    syntax Stmt ::= "reset" ";" Exp which sets the value of the Boolean flag back
    to its default value via a ... on the right-hand side. You will need to add
    an additional cell around the Boolean cell to make this work.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.16: Maps, Semantic Lists, and Sets.

Lesson 1.16: Maps, Semantic Lists, and Sets

The purpose of this lesson is to explain how to use the data structure sorts
provided by K: maps, lists, and sets.

Maps

The most frequently used type of data structure in K is the map. The sort
provided by K for this purpose is the Map sort, and it is provided in
domains.md in the MAP
module. This type is not (currently) polymorphic. All Map terms are maps that
map terms of sort KItem to other terms of sort KItem. A KItem can contain
any sort except a K sequence. If you need to store such a term in a
map, you can always use a wrapper such as syntax KItem ::= kseq(K).

A Map pattern consists of zero or more map elements (as represented by the
symbol syntax Map ::= KItem "|->" KItem), mixed in any order, separated by
whitespace, with zero or one variables of sort Map. The empty map is
represented by .Map. If all of the bindings for the variables in the keys
of the map can be deterministically chosen, these patterns can be matched in
O(1) time. If they cannot, then each map element that cannot be
deterministically constructed contributes a single dimension of polynomial
time to the cost of the matching. In other words, a single such element is
linear, two are quadratic, three are cubic, etc.

Patterns like the above are the only type of Map pattern that can appear
on the left-hand-side of a rule. In other words, you are not allowed to write
a Map pattern on the left-hand-side with more than one variable of sort Map
in it. You are, however, allowed to write such patterns on the right-hand-side
of a rule. You can also write a function pattern in the key of a map element
so long as all the variables in the function pattern can be deterministically
chosen.

Note the meaning of matching on a Map pattern: a map pattern with no
variables of sort Map will match if the map being matched has exactly as
many bindings as |-> symbols in the pattern. It will then match if each
binding in the map pattern matches exactly one distinct binding in the map
being matched. A map pattern with one Map variable will also match any map
that contains such a map as a subset. The variable of sort Map will be bound
to whatever bindings are left over (.Map if there are no bindings left over).

Here is an example of a simple definition that implements a very basic
variable declaration semantics using a Map to store the value of variables
(lesson-16-a.k):

module LESSON-16-A-SYNTAX
  imports INT-SYNTAX
  imports ID-SYNTAX

  syntax Exp ::= Id | Int
  syntax Decl ::= "int" Id "=" Exp ";" [strict(2)]
  syntax Pgm ::= List{Decl,""}
endmodule

module LESSON-16-A
  imports LESSON-16-A-SYNTAX
  imports BOOL

  configuration <T>
                  <k> $PGM:Pgm </k>
                  <state> .Map </state>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // variable declaration
  rule <k> int X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>

  // variable lookup
  rule <k> X:Id => I ...</k>
       <state>... X |-> I ...</state>

  syntax Bool ::= isKResult(K) [symbol, function]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

There are several new features in this definition. First, note we import
the module ID-SYNTAX. This module is defined in domains.md and provides a
basic syntax for identifiers. We are using the Id sort provided by this
module in this definition to implement the names of program variables. This
syntax is only imported when parsing programs, not when parsing rules. Later in
this lesson we will see how to reference specific concrete identifiers in a
rule.

Second, we introduce a single new function over the Map sort. This function,
which is represented by the symbol
syntax Map ::= Map "[" KItem "<-" KItem "]", represents the map update
operation. Other functions over the Map sort can be found in domains.md.

Finally, we have used the ... syntax on a cell containing a Map. In this
case, the meaning of <state>... Pattern ...</state>,
<state>... Pattern </state>, and <state> Pattern ...</state> are the same:
it is equivalent to writing <state> (Pattern) _:Map </state>.

Consider the following program (a.decl):

int x = 0;
int y = 1;
int a = x;

If we run this program with krun, we will get the following result:

<T>
  <k>
    .
  </k>
  <state>
    a |-> 0
    x |-> 0
    y |-> 1
  </state>
</T>

Note that krun has automatically sorted the collection for you. This doesn't
happen at runtime, so you still get the performance of a hash map, but it will
help make the output more readable.

Exercise

Create a sort Stmt that is a subsort of Decl. Create a production of sort
Stmt for variable assignment in addition to the variable declaration
production. Feel free to use the syntax syntax Stmt ::= Id "=" Exp ";". Write
a rule that implements variable assignment using a map update function. Then
write the same rule using a map pattern. Test your implementations with some
programs to ensure they behave as expected.

Semantic Lists

In a previous lesson, we explained how to represent lists in the AST of a
program. However, this is not the only context where lists can be used. We also
frequently use lists in the configuration of an interpreter in order to
represent certain types of program state. For this purpose, it is generally
useful to have an associative-list sort, rather than the cons-list sorts
provided in Lesson 1.12.

The type provided by K for this purpose is the List sort, and it is also
provided in domains.md, in the LIST module. This type is also not
(currently) polymorphic. Like Map, all List terms are lists of terms of the
KItem sort.

A List pattern in K consists of zero or more list elements (as represented by
the ListItem symbol), followed by zero or one variables of sort List,
followed by zero or more list elements. An empty list is represented by
.List. These patterns can be matched in O(log(N)) time. This is the only
type of List pattern that can appear on the left-hand-side of a rule. In
other words, you are not allowed to write a List pattern on the
left-hand-side with more than one variable of sort List in it. You are,
however, allowed to write such patterns on the right-hand-side of a rule.

Note the meaning of matching on a List pattern: a list pattern with no
variables of sort List will match if the list being matched has exactly as
many elements as ListItem symbols in the pattern. It will then match if each
element in sequence matches the pattern contained in the ListItem symbol. A
list pattern with one variable of sort List operates the same way, except
that it can match any list with at least as many elements as ListItem
symbols, so long as the prefix and suffix of the list match the patterns inside
the ListItem symbols. The variable of sort List will be bound to whatever
elements are left over (.List if there are no elements left over).

The ... syntax is allowed on cells containing lists as well. In this case,
the meaning of <cell>... Pattern </cell> is the same as
<cell> _:List (Pattern) </cell>, the meaning of <cell> Pattern ...</cell>
is the same as <cell> (Pattern) _:List</cell>. Because list patterns with
multiple variables of sort List are not allowed, it is an error to write
<cell>... Pattern ...</cell>.

Here is an example of a simple definition that implements a very basic
function-call semantics using a List as a function stack (lesson-16-b.k):

module LESSON-16-B-SYNTAX
  imports INT-SYNTAX
  imports ID-SYNTAX

  syntax Exp ::= Id "(" ")" | Int
  syntax Stmt ::= "return" Exp ";" [strict]
  syntax Decl ::= "fun" Id "(" ")" "{" Stmt "}"
  syntax Pgm ::= List{Decl,""}
  syntax Id ::= "main" [token]
endmodule

module LESSON-16-B
  imports LESSON-16-B-SYNTAX
  imports BOOL
  imports LIST

  configuration <T>
                  <k> $PGM:Pgm ~> main () </k>
                  <functions> .Map </functions>
                  <fstack> .List </fstack>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // function definitions
  rule <k> fun X:Id () { S } => . ...</k>
       <functions>... .Map => X |-> S ...</functions>

  // function call
  syntax KItem ::= stackFrame(K)
  rule <k> X:Id () ~> K => S </k>
       <functions>... X |-> S ...</functions>
       <fstack> .List => ListItem(stackFrame(K)) ...</fstack>

  // return statement
  rule <k> return I:Int ; ~> _ => I ~> K </k>
       <fstack> ListItem(stackFrame(K)) => .List ...</fstack>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

Notice that we have declared the production syntax Id ::= "main" [token].
Since we use the ID-SYNTAX module, this declaration is necessary in order to
be able to refer to the main identifier directly in the configuration
declaration. Our <k> cell now contains a K sequence initially: first we
process all the declarations in the program, then we call the main function.

Consider the following program (foo.func):

fun foo() { return 5; }
fun main() { return foo(); }

When we krun this program, we should get the following output:

<T>
  <k>
    5 ~> .
  </k>
  <functions>
    foo |-> return 5 ;
    main |-> return foo ( ) ;
  </functions>
  <fstack>
    .List
  </fstack>
</T>

Note that we have successfully put on the <k> cell the value returned by the
main function.

Exercise

Add a term of sort Id to the stackFrame operator to keep track of the
name of the function in that stack frame. Then write a function
syntax String ::= printStackTrace(List) that takes the contents of the
<fstack> cell and pretty prints the current stack trace. You can concatenate
strings with +String in the STRING module in domains.md, and you can
convert an Id to a String with the Id2String function in the ID module.
Test this function by creating a new expression that returns the current stack
trace as a string. Make sure to update isKResult and the Exp sort as
appropriate to allow strings as values.

Sets

The final primary data structure sort in K is a set, i.e., an idempotent
unordered collection where elements are deduplicated. The sort provided by K
for this purpose is the Set sort and it is provided in domains.md in the
SET module. Like maps and lists, this type is not (currently) polymorphic.
Like Map and List, all Set terms are sets of terms of the KItem sort.

A Set pattern has the exact same restrictions as a Map pattern, except that
its elements are treated like keys, and there are no values. It has the same
performance characteristics as well. However, syntactically it is more similar
to the List sort: An empty Set is represented by .Set, but a set element
is represented by the SetItem symbol.

Matching behaves similarly to the Map sort: a set pattern with no variables
of sort Set will match if the set has exactly as many bindings as SetItem
symbols, and if each element pattern matches one distinct element in the set.
A set with a variable of sort Set also matches any superset of such a set.
As with map, the elements left over will be bound to the Set variable (or
.Set if no elements are left over).

Like Map, the ... syntax on a set is syntactic sugar for an anonymous
variable of sort Set.

Here is an example of a simple modification to LESSON-16-A which uses a Set
to ensure that variables are never declared more than once. In practice, you
would likely just use the in_keys symbol over maps to test for this, but
it's still useful as an example of sets in practice:

module LESSON-16-C-SYNTAX
  imports LESSON-16-A-SYNTAX
endmodule

module LESSON-16-C
  imports LESSON-16-C-SYNTAX
  imports BOOL
  imports SET

  configuration <T>
                  <k> $PGM:Pgm </k>
                  <state> .Map </state>
                  <declared> .Set </declared>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // variable declaration
  rule <k> int X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>
       <declared> D => D SetItem(X) </declared>
    requires notBool X in D

  // variable lookup
  rule <k> X:Id => I ...</k>
       <state>... X |-> I ...</state>
       <declared>... SetItem(X) ...</declared>

  syntax Bool ::= isKResult(K) [symbol, function]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

Now if we krun a program containing duplicate declarations, it will get
stuck on the declaration.

Exercises

  1. Modify your solution to Lesson 1.14, Exercise 2 and introduce the sorts
    Decls, Decl, and Stmt which include variable and function declaration
    (without function parameters), and return and assignment statements, as well
    as call expressions. Use List and Map to implement these operators, making
    sure to consider the interactions between components, such as saving and
    restoring the environment of variables at each call site. Don't worry about
    local function definitions or global variables for now. Make sure to test the
    resulting interpreter.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.17: Cell Multiplicity and Cell Collections.

Lesson 1.17: Cell Multiplicity and Cell Collections

The purpose of this lesson is to explain how you can create optional cells
and cells that repeat multiple times in a configuration using a feature called
cell multiplicity.

Cell Multiplicity

K allows you to specify attributes for cell productions as part of the syntax
of configuration declarations. Unlike regular productions, which use the []
syntax for attributes, configuration cells use an XML-like attribute syntax:

configuration <k color="red"> $PGM:K </k>

This configuration declaration gives the <k> cell the color red during
unparsing using the color attribute as discussed in
Lesson 1.9.

However, in addition to the usual attributes for productions, there are some
other attributes that can be applied to cells with special meaning. One such
attribute is the multiplicity attribute. By default, each cell that is
declared occurs exactly once in every configuration term. However, using the
multiplicity attribute, this default behavior can be changed. There are two
values that this attribute can have: ? and *.

Optional cells

The first cell multiplicity we will discuss is ?. Similar to a regular
expression language, this attribute tells the compiler that this cell can
appear 0 or 1 times in the configuration. In other words, it is an
optional cell. By default, K does not create optional cells in the initial
configuration, unless that optional cell has a configuration variable inside
it. However, it is possible to override the default behavior and create that
cell initially by adding the additional cell attribute initial="".

K uses the .Bag symbol to represent the absence of any cells in a particular
rule. Consider the following module:

module LESSON-17-A
  imports INT

  configuration <k> $PGM:K </k>
                <optional multiplicity="?"> 0 </optional>

  syntax KItem ::= "init" | "destroy"

  rule <k> init => . ...</k>
       (.Bag => <optional> 0 </optional>)
  rule <k> destroy => . ...</k>
       (<optional> _ </optional> => .Bag)

endmodule

In this definition, when the init symbol is executed, the <optional> cell
is added to the configuration, and when the destroy symbol is executed, it
is removed. Any rule that matches on that cell will only match if that cell is
present in the configuration.

Exercise

Create a simple definition with a Stmts sort that is a List{Stmt,""} and
a Stmt sort with the constructors
syntax Stmt ::= "enable" | "increment" | "decrement" | "disable". The
configuration should have an optional cell that contains an integer that
is created with the enable command, destroyed with the disable command,
and its value is incremented or decremented by the increment and decrement
command.

Cell collections

The second type of cell multiplicity we will discuss is *. Simlar to a
regular expression language, this attribute tells the compiler that this cell
can appear 0 or more times in the configuration. In other words, it is a
cell collection. Cells with multiplicity * must be the only child of
their parent cell. As a convention, the inner cell is usually named with the
singular form of what it contains, and the outer cell with the plural form, for
example, "thread" and "threads".

All cell collections are required to have the type attribute set to either
Set or Map. A Set cell collection is represented as a set and behaves
internally the same as the Set sort, although it actually declares a new
sort. A Map cell collection is represented as a Map in which the first
subcell of the cell collection is the key and the remaining cells are the
value.

For example, consider the following module:

module LESSON-17-B
  imports INT
  imports BOOL
  imports ID-SYNTAX

  syntax Stmt ::= Id "=" Exp ";" [strict(2)]
                | "return" Exp ";" [strict]
  syntax Stmts ::= List{Stmt,""}
  syntax Exp ::= Id
               | Int
               | Exp "+" Exp [seqstrict]
               | "spawn" "{" Stmts "}"
               | "join" Exp ";" [strict]

  configuration <threads>
                  <thread multiplicity="*" type="Map">
                    <id> 0 </id>
                    <k> $PGM:K </k>
                  </thread>
                </threads>
                <state> .Map </state>
                <next-id> 1 </next-id>

  rule <k> X:Id => I:Int ...</k>
       <state>... X |-> I ...</state>
  rule <k> X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>
  rule <k> S:Stmt Ss:Stmts => S ~> Ss ...</k>
  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>

  rule <thread>...
         <k> spawn { Ss } => NEXTID ...</k>
       ...</thread>
       <next-id> NEXTID => NEXTID +Int 1 </next-id>
       (.Bag =>
       <thread>
         <id> NEXTID </id>
         <k> Ss </k>
       </thread>)

  rule <thread>...
         <k> join ID:Int ; => I ...</k>
       ...</thread>
       (<thread>
         <id> ID </id>
         <k> return I:Int ; ...</k>
       </thread> => .Bag)

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

This module implements a very basic fork/join semantics. The spawn expression
spawns a new thread to execute a sequence of statements and returns a thread
id, and the join statement waits until a thread executes return and then
returns the return value of the thread.

Note something quite novel here: the <k> cell is inside a cell of
multiplicity *. Since the <k> cell is just a regular cell (mostly), this
is perfectly allowable. Rules that don't mention a specific thread are
automatically completed to match any thread.

When you execute programs in this language, the cells in the cell collection
get sorted and printed like any other collection, but they still display like
cells. Rules in this language also benefit from all the structural power of
cells, allowing you to omit cells you don't care about or complete the
configuration automatically. This allows you to have the power of cells while
still being a collection under the hood.

Exercises

  1. Modify the solution from Lesson 1.16, Exercise 1 so that the cell you use to
    keep track of functions in a Map is now a cell collection. Run some programs
    and compare how they get unparsed before and after this change.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.18: Term Equality and the Ternary Operator.

Lesson 1.18: Term Equality and the Ternary Operator

The purpose of this lesson is to introduce how to compare equality of terms in
K, and how to put conditional expressions directly into the right-hand side of
rules.

Term Equality

One major way you can compare whether two terms are equal in K is to simply
match both terms with a variable with the same name. This will only succeed
in matching if the two terms are equal structurally. However, sometimes this
is impractical, and it is useful to have access to a way to actually compare
whether two terms in K are equal. The operator for this is found in
domains.md in the K-EQUAL
module. The operator is ==K and takes two terms of sort K and returns a
Bool. It returns true if they are equal. This includes equality over builtin
types such as Map and Set where equality is not purely structural in
nature. However, it does not include any notion of semantic equality over
user-defined syntax. The inverse symbol for inequality is =/=K.

Ternary Operator

One way to introduce conditional logic in K is to have two separate rules,
each with a side condition (or one rule with a side condition and another with
the owise attribute). However, sometimes it is useful to explicitly write
a conditional expression directly in the right-hand side of a rule. For this
purpose, K defines one more operator in the K-EQUAL module, which corresponds
to the usual ternary operator found in many languages. Here is an example of its
usage (lesson-18.k):

module LESSON-18
  imports INT
  imports BOOL
  imports K-EQUAL

  syntax Exp ::= Int | Bool | "if" "(" Exp ")" Exp "else" Exp [strict(1)]

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true

  rule if (B:Bool) E1:Exp else E2:Exp => #if B #then E1 #else E2 #fi
endmodule

Note the symbol on the right-hand side of the final rule. This symbol is
polymorphic: B must be of sort Bool, but E1 and E2 could have been
any sort so long as both were of the same sort, and the sort of the entire
expression becomes equal to that sort. K supports polymorphic built-in
operators, but does not yet allow users to write their own polymorphic
productions.

The behavior of this function is to evaluate the Boolean expression to a
Boolean, then pick one of the two children and return it based on whether the
Boolean is true or false. Please note that it is not a good idea to use this
symbol in cases where one or both of the children is potentially undefined
(for example, an integer expression that divides by zero). While the default
implementation is smart enough to only evaluate the branch that happens to be
picked, this will not be true when we begin to do program verification. If
you need short circuiting behavior, it is better to use a side condition.

Exercises

  1. Write a function in K that takes two terms of sort K and returns an
    Int: the Int should be 0 if the terms are equal and 1 if the terms are
    unequal.

  2. Modify your solution to Lesson 1.16, Exercise 1 and introduce an if
    Stmt to the syntax of the language, then implement it using the #if symbol.
    Make sure to write tests for the resulting interpreter.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.19: Debugging with GDB.

Lesson 1.19: Debugging with GDB or LLDB

The purpose of this lesson is to teach how to debug your K interpreter using
the K-language support provided in GDB or
LLDB.

Caveats

This lesson has been written with GDB support on Linux in mind. Unfortunately,
on macOS, GDB has limited support. To address this, we have introduced early
experimental support for debugging with LLDB on macOS. In some cases, the
features supported by LLDB are slightly different to those supported by GDB; the
tutorial text will make this clear where necessary. If you use a macOS with an
LLVM version older than 15, you may need to upgrade it to use the LLDB
correctly. If you encounter an issue on either operating system, please open an
issue against the K repository.

Getting started

On Linux, you will need GDB in order to complete this lesson. If you do not
already have GDB installed, then do so. Steps to install GDB are outlined in
this GDB Tutorial.

On macOS, LLDB should already have been installed with K's build dependencies
(whether you have built K from source, or installed it using kup or Homebrew).

The first thing neccessary in order to debug a K interpreter is to build the
interpreter with full debugging support enabled. This can be done relatively
simply. First, run kompile with the command line flag --enable-llvm-debug.
The resulting compiled K definition will be ready to support debugging.

Once you have a compiled K definition and a program you wish to debug, you can
start the debugger by passing the --debugger flag to krun. This will
automatically load the program you are executing into GDB and drop you into a
GDB shell ready to start executing the program.

As an example, consider the following K definition (lesson-19-a.k):

module LESSON-19-A
  imports INT

  rule I => I +Int 1
    requires I <Int 100
endmodule

If we compile this definition with kompile lesson-19-a.k --enable-llvm-debug,
and run the program 0 in the debugger with krun -cPGM=0 --debugger, we will
see the following output (roughly, and depending on which platform you are
using):

GDB / Linux

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./lesson-19-a-kompiled/interpreter...
warning: File "/home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter
line to your configuration file "/home/dwightguth/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/dwightguth/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb)

To make full advantage of the GDB features of K, you should follow the first
command listed in this output message and add the corresponding
add-auto-load-safe-path command to your ~/.gdbinit file as prompted.
Please note that the path will be different on your machine than the one
listed above. Adding directories to the "load safe path" effectively tells GDB
to trust those directories. All content under a given directory will be recursively
trusted, so if you want to avoid having to add paths to the "load safe path" every
time you kompile a different K definition, then you can just trust a minimal
directory containing all your kompiled files; however, do not choose a top-level directory containing arbitrary files as this amounts to trusting arbitrary files and is a security risk. More info on the load safe path
can be found here.

LLDB / macOS

(lldb) target create "./lesson-19-a-kompiled/interpreter"
warning: 'interpreter' contains a debug script. To run this script in this debug session:

    command script import "/Users/brucecollie/code/scratch/lesson-19-a-kompiled/interpreter.dSYM/Contents/Resources/Python/interpreter.py"

To run all discovered debug scripts in this session:

    settings set target.load-script-from-symbol-file true

Current executable set to '/Users/brucecollie/code/scratch/lesson-19-a-kompiled/interpreter' (x86_64).
(lldb) settings set -- target.run-args  ".krun-2023-03-20-11-22-46-TcYt9ffhb2/tmp.in.RupiLwHNfn" "-1" ".krun-2023-03-20-11-22-46-TcYt9ffhb2/result.kore"
(lldb) 

LLDB applies slightly different security policies to GDB. To load K's debugging
scripts for this session only, you can run the command script import line at
the LLDB prompt. The loaded scripts will not persist across debugging sessions
if you do this. It is also possible to configure LLDB to automatically load the
K scripts when an interpreter is started in LLDB; doing so requires a slightly
less broad permission than GDB.

On macOS, the .dSYM directory that contains debugging symbols for an
executable can also contain Python scripts in Contents/Resources/Python. If
there is a Python script with a name matching the name of the current executable
(here, interpreter and interpreter.py), it will be automatically loaded if
the target.load-script-from-symbol-file setting is set). You can therefore add
the settings set command to your ~/.lldbinit without enabling full arbitrary
code execution, but you should be aware of the paths from which code can be
executed if you do so.

Basic commands

LLDB Note: the k start and k step commands are currently not
implemented in the K LLDB scripts. To work around this limitation temporarily,
you can run process launch --stop-at-entry instead of k start. To emulate
k step, first run rbreak k_step once, then continue instead of each k step. We hope to address these limitations soon.

The most basic commands you can execute in the K GDB session are to run your
program or to step through it. The first can be accomplished using GDB's
built-in run command. This will automatically start the program and begin
executing it. It will continue until the program aborts or finishes, or the
debugger is interrupted with Ctrl-C.

Sometimes you want finer-grained control over how you proceed through the
program you are debugging. To step through the rule applications in your
program, you can use the k start and k step GDB commands.

k start is similar to the built-in start command in that it starts the
program and then immediately breaks before doing any work. However, unlike
the start command which will break immediately after the main method of
a program is executed, the K start program will initialize the rewriter,
evaluate the initial configuration, and break immediately prior to applying
any rewrite steps.

In the example above, here is what we see when we run the k start command:

Temporary breakpoint 1 at 0x239210
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter .krun-2021-08-13-14-10-50-sMwBkbRicw/tmp.in.01aQt85TaA -1 .krun-2021-08-13-14-10-50-sMwBkbRicw/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, 0x0000000000239210 in main ()
0x0000000000231890 in step (subject=<k>
  0 ~> .
</k>)
(gdb)

As you can see, we are stopped at the step function in the interpreter.
This function is responsible for taking top-level rewrite steps. The subject
parameter to this function is the current K configuration.

We can step through K rewrite steps one at a time by running the k step
command. By default, this takes a single rewrite step (including any function
rule applications that are part of that step).

Here is what we see when we run that command:

Continuing.

Temporary breakpoint -22, 0x0000000000231890 in step (subject=<k>
  1 ~> .
</k>)
(gdb)

As we can see, we have taken a single rewrite step. We can also pass a number
to the k step command which indicates the number of rewrite steps to take.

Here is what we see if we run k step 10:

Continuing.

Temporary breakpoint -23, 0x0000000000231890 in step (subject=<k>
  11 ~> .
</k>)
(gdb)

As we can see, ten rewrite steps were taken.

Breakpoints

The next important step in debugging an application in GDB is to be able to
set breakpoints. Generally speaking, there are three types of breakpoints we
are interested in a K semantics: Setting a breakpoint when a particular
function is called, setting a breakpoint when a particular rule is applied,
and setting a breakpoint when a side condition of a rule is evaluated.

The easiest way to do the first two things is to set a breakpoint on the
line of code containing the function or rule.

For example, consider the following K definition (lesson-19-b.k):

module LESSON-19-B
  imports BOOL

  syntax Bool ::= isBlue(Fruit) [function]
  syntax Fruit ::= Blueberry() | Banana()
  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false

  rule F:Fruit => isBlue(F)
endmodule

Once this program has been compiled for debugging, we can run the program
Blueberry(). We can then set a breakpoint that stops when the isBlue
function is called with the following command in GDB:

break lesson-19-b.k:4

Similarly, in LLDB, run:

breakpoint set --file lesson-19-b.k --line 4

Here is what we see if we set this breakpoint and then run the interpreter:

(gdb) break lesson-19-b.k:4
Breakpoint 1 at 0x231040: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 4.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-20-27-vXOQmV6lwS/tmp.in.fga98yqXlc -1 .krun-2021-08-13-14-20-27-vXOQmV6lwS/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit (_1=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:4
4         syntax Bool ::= isBlue(Fruit) [function]
(gdb)
(lldb) breakpoint set --file lesson-19-b.k --line 4
Breakpoint 1: where = interpreter`LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit + 20 at lesson-19-b.k:4:19, address = 0x0000000100003ff4
(lldb) run
Process 50546 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50546 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003ff4 interpreter`LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit(_1=Blueberry ( )) at lesson-19-b.k:4:19
   1   	module LESSON-19-B
   2   	  imports BOOL
   3   	
-> 4   	  syntax Bool ::= isBlue(Fruit) [function]
   5   	  syntax Fruit ::= Blueberry() | Banana()
   6   	  rule isBlue(Blueberry()) => true
   7   	  rule isBlue(Banana()) => false
(lldb)

As we can see, we have stopped at the point where we are evaluating that
function. The value _1 that is a parameter to that function shows the
value passed to the function by the caller.

We can also break when the isBlue(Blueberry()) => true rule applies by simply
changing the line number to the line number of that rule:

(gdb) break lesson-19-b.k:6
Breakpoint 1 at 0x2af710: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-32-36-7kD0ic7XwD/tmp.in.8JNH5Qtmow -1 .krun-2021-08-13-14-32-36-7kD0ic7XwD/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, apply_rule_138 () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:6
6         rule isBlue(Blueberry()) => true
(gdb)
(lldb) breakpoint set --file lesson-19-b.k --line 6
Breakpoint 1: where = interpreter`apply_rule_140 at lesson-19-b.k:6:8, address = 0x0000000100004620
(lldb) run
Process 50681 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50681 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100004620 interpreter`apply_rule_140 at lesson-19-b.k:6:8
   3   	
   4   	  syntax Bool ::= isBlue(Fruit) [function]
   5   	  syntax Fruit ::= Blueberry() | Banana()
-> 6   	  rule isBlue(Blueberry()) => true
   7   	  rule isBlue(Banana()) => false
   8   	
   9   	  rule F:Fruit => isBlue(F)
(lldb) 

We can also do the same with a top-level rule:

(gdb) break lesson-19-b.k:9
Breakpoint 1 at 0x2aefa0: lesson-19-b.k:9. (2 locations)
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-33-13-9fC8Sz4aO3/tmp.in.jih1vtxSiQ -1 .krun-2021-08-13-14-33-13-9fC8Sz4aO3/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, apply_rule_107 (Var'Unds'DotVar0=<generatedCounter>
  0
</generatedCounter>, Var'Unds'DotVar1=., VarF=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:9
9         rule F:Fruit => isBlue(F)
(gdb)
(lldb) breakpoint set --file lesson-19-b.k --line 9
Breakpoint 1: 2 locations.
(lldb) run
Process 50798 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50798 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f2e interpreter`apply_rule_109(Var'Unds'DotVar0=<generatedCounter>
  0
</generatedCounter>, Var'Unds'DotVar1=., VarF=Blueberry ( )) at lesson-19-b.k:9:8
   6   	  rule isBlue(Blueberry()) => true
   7   	  rule isBlue(Banana()) => false
   8   	
-> 9   	  rule F:Fruit => isBlue(F)
   10  	endmodule
(lldb)  

Unlike the function rule above, we see several parameters to this function.
These are the substitution that was matched for the function. Variables only
appear in this substitution if they are actually used on the right-hand side
of the rule.

Advanced breakpoints

Sometimes it is inconvenient to set the breakpoint based on a line number.

It is also possible to set a breakpoint based on the rule label of a particular
rule. Consider the following definition (lesson-19-c.k):

module LESSON-19-C
  imports INT
  imports BOOL

  syntax Bool ::= isEven(Int) [function]
  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0

endmodule

We will run the program isEven(4). We can set a breakpoint for when a rule
applies by means of the MODULE-NAME.label.rhs syntax:

(gdb) break LESSON-19-C.isEven.rhs
Breakpoint 1 at 0x2afda0: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-40-29-LNNT8YEZ61/tmp.in.ZG93vWCGGC -1 .krun-2021-08-13-14-40-29-LNNT8YEZ61/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LESSON-19-C.isEven.rhs () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6         rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb)
(lldb) breakpoint set --name LESSON-19-C.isEven.rhs
Breakpoint 1: where = interpreter`LESSON-19-C.isEven.rhs at lesson-19-c.k:6:18, address = 0x00000001000038e0
(lldb) run
Process 51205 launched: '/Users/brucecollie/code/scratch/lesson-19-c-kompiled/interpreter' (x86_64)
Process 51205 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000038e0 interpreter`LESSON-19-C.isEven.rhs at lesson-19-c.k:6:18
   3   	  imports BOOL
   4   	
   5   	  syntax Bool ::= isEven(Int) [function]
-> 6   	  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
   7   	  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
   8   	
   9   	endmodule
(lldb) 

We can also set a breakpoint for when a rule's side condition is evaluated
by means of the MODULE-NAME.label.sc syntax:

(gdb) break LESSON-19-C.isEven.sc
Breakpoint 1 at 0x2afd70: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-41-48-1BoGfJRbYc/tmp.in.kg4F8cwfCe -1 .krun-2021-08-13-14-41-48-1BoGfJRbYc/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6         rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb) finish
Run till exit from #0  LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
0x00000000002b2662 in LblisEven'LParUndsRParUnds'LESSON-19-C'Unds'Bool'Unds'Int (_1=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:5
5         syntax Bool ::= isEven(Int) [function]
Value returned is $1 = true
(gdb)
(lldb) breakpoint set --name LESSON-19-C.isEven.sc
Breakpoint 1: where = interpreter`LESSON-19-C.isEven.sc + 1 at lesson-19-c.k:6:18, address = 0x00000001000038c1
(lldb) run
Process 52530 launched: '/Users/brucecollie/code/scratch/lesson-19-c-kompiled/interpreter' (x86_64)
Process 52530 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000038c1 interpreter`LESSON-19-C.isEven.sc(VarI=0x0000000101800088) at lesson-19-c.k:6:18
   3   	  imports BOOL
   4   	
   5   	  syntax Bool ::= isEven(Int) [function]
-> 6   	  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
   7   	  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
   8   	
   9   	endmodule
(lldb) finish
Process 52649 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step out
Return value: (bool) $0 = true

    frame #0: 0x00000001000069e5 interpreter`LblisEven'LParUndsRParUnds'LESSON-19-C'Unds'Bool'Unds'Int(_1=0x0000000101800088) at lesson-19-c.k:5:19
   2   	  imports INT
   3   	  imports BOOL
   4   	
-> 5   	  syntax Bool ::= isEven(Int) [function]
   6   	  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
   7   	  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
   8
(lldb)

Here we have used the built-in command finish to tell us whether the side
condition returned true or not. Note that once again, we see the substitution
that was matched from the left-hand side. Like before, a variable will only
appear here if it is used in the side condition.

Debugging rule matching

Sometimes it is useful to try to determine why a particular rule did or did
not apply. K provides some basic debugging commands which make it easier
to determine this.

Consider the following K definition (lesson-19-d.k):

module LESSON-19-D

  syntax Foo ::= foo(Bar)
  syntax Bar ::= bar(Baz) | bar2(Baz)
  syntax Baz ::= baz() | baz2()

  rule [baz]: foo(bar(baz())) => .K

endmodule

Suppose we try to run the program foo(bar(baz2())). It is obvious from this
example why the rule in this definition will not apply. However, in practice,
such cases are not always obvious. You might look at a rule and not immediately
spot why it didn't apply on a particular term. For this reason, it can be
useful to get the debugger to provide a log about how it tried to match that
term. You can do this with the k match command. If you are stopped after
having run k start or k step, you can obtain this log for any rule after
any step by running the command k match MODULE.label subject for a particular
top-level rule label.

For example, with the baz rule above, we get the following output:

(gdb) k match LESSON-19-D.baz subject
Subject:
baz2 ( )
does not match pattern:
baz ( )
(lldb) k match LESSON-19-D.baz subject
Subject:
baz2 ( )
does not match pattern:
baz ( )

As we can see, it provided the exact subterm which did not match against the
rule, as well as the particular subpattern it ought to have matched against.

This command does not actually take any rewrite steps. In the event that
matching actually succeeds, you will still need to run the k step command
to advance to the next step.

Final notes

In addition to the functionality provided above, you have the full power of
GDB or LLDB at your disposal when debugging. Some features are not particularly
well-adapted to K code and may require more advanced knowledge of the
term representation or implementation to use effectively, but anything that
can be done in GDB or LLDB can in theory be done using this debugging functionality.
We suggest you refer to the
GDB Documentation or
LLDB Tutorial if you
want to try to do something and are unsure as to how.

Exercises

  1. Compile your solution to Lesson 1.18, Exercise 2 with debugging support
    enabled and step through several programs you have previously used to test.
    Then set a breakpoint on the isKResult function and observe the state of the
    interpreter when stopped at that breakpoint. Set a breakpoint on the rule for
    addition and run a program that causes it to be stopped at that breakpoint.
    Finally, step through the program until the addition symbol is at the top
    of the K cell, and then use the k match command to report the reason why
    the subtraction rule does not apply. You may need to modify the definition
    to insert some rule labels.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.20: K Backends and the Haskell Backend.

Lesson 1.20: K Backends and the Haskell Backend

The purpose of this lesson is to teach about the multiple backends of K,
in particular the Haskell Backend which is the complement of the backend we
have been using so far.

K Backends

Thus far, we have not discussed the distinction between the K frontend and
the K backends at all. We have simply assumed that if you run kompile on a
K definition, there will be a compiler backend that will allow you to execute
the K definition you have compiled.

K actually has multiple different backends. The one we have been using so far
implicitly, the default backend, is called the LLVM Backend. It is
designed to support efficient, optimized concrete execution and search. It
does this by compiling your K definition to LLVM bitcode and then using LLVM
to generate machine code for it that is compiled and linked and executed.
However, K is a formal methods toolkit at the end of the day, and the primary
goal many people have when defining a programming language in K is to
ultimately be able to perform more advanced verification on programs in their
programming language.

It is for this purpose that K also provides the Haskell Backend, so called
because it is implemented in Haskell. While we will cover the features of the
Haskell Backend in more detail in the next two lessons, the important thing to
understand is that it is a separate backend which is optimized for more formal
reasoning about programming languages. While it is capable of performing
concrete execution, it does not do so as efficiently as the LLVM Backend.
In exchange, it provides more advanced features.

Choosing a backend

You can choose which backend to use to compile a K definition by means of the
--backend flag to kompile. By default, if you do not specify this flag, it
is equivalent to if you had specified --backend llvm. However, to use the
Haskell Backend instead, you can simply say kompile --backend haskell on a
particular K definition.

As an example, here is a simple K definition that we have seen before in the
previous lesson (lesson-20.k):

module LESSON-20
  imports INT

  rule I => I +Int 1
    requires I <Int 100
endmodule

Previously we compiled this definition using the LLVM Backend, but if we
instead execute the command kompile lesson-20.k --backend haskell, we
will get an interpreter for this K definition that is implemented in Haskell
instead. Unlike the default LLVM Backend, the Haskell Backend is not a
compiler per se. It does not generate new Haskell code corresponding to your
programming language and then compile and execute it. Instead, it is an
interpreter which reads the generated IR from kompile and implements in
Haskell an interpreter that is capable of interpreting any K definition.

Note that on arm64 macOS (Apple Silicon), there is a known issue with the Compact
library that causes crashes in the Haskell backend. Pass the additional flag
--no-haskell-binary to kompile to resolve this.
This flag is also needed when using krun.

Exercise

Try running the program 0 in this K definition on the Haskell Backend and
compare the final configuration to what you would get compiling the same
definition with the LLVM Backend.

Legacy backends

As a quick note, K does provide one other backend, which exists primarily as
legacy code which should be considered deprecated. This is the
Java Backend. The Java Backend is essentially a precursor to the Haskell
Backend. We will not cover this backend in any detail since it is deprecated,
but we still mention it here for the purposes of understanding.

Exercises

  1. Compile your solution to Lesson 1.18, Exercise 2 with the Haskell Backend
    and execute some programs. Compare the resulting configurations with the
    output of the same program on the LLVM Backend. Note that if you are getting
    different behaviors on the Haskell backend, you might have some luck debugging
    by passing --search to krun when using the LLVM backend.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.21: Unification and Symbolic Execution.

Lesson 1.21: Unification and Symbolic Execution

The purpose of this lesson is to teach the basic concepts of symbolic execution
in order to introduce the unique capabilities of the Haskell Backend at a
conceptual level.

Symbolic Execution

Thus far, all of the programs we have run using K have been concrete
configurations. What this means is that the configuration we use to initialize
the K rewrite engine is concrete; in other words, contains no logical
variables. The LLVM Backend is a concrete execution engine, meaning that
it is only capable of rewriting concrete configurations.

By contrast, the Haskell Backend performs symbolic execution, which is
capable of rewriting any configuration, including those where parts of the
configuration are symbolic, ie, contain variables or uninterpreted
functions.

Unification

Previously, we have introduced the concept that K rewrite rules operate by
means of pattern matching: the current configuration being rewritten is pattern
matched against the left-hand side of the rewrite rule, and the substitution
is used in order to construct a new term from the right-hand side. In symbolic
execution, we use
unification
instead of pattern matching. To summarize, unification behaves akin to a
two-way pattern matching where both the configuration and the left-hand side
of the rule can contain variables, and the algorithm generates a
most general unifier containing substitutions for the variables in both
which will make both terms equal.

Feasibility

Unification by itself cannot completely solve the problem of symbolic
execution. One task symbolic execution must perform is to identify whether
a particular symbolic term is feasible, that is to say, that there actually
exists a concrete instantiation of that term such that all the logical
constraints on that term can actually be satisfied. The Haskell Backend
delegates this task to Z3, an
SMT solver.
This solver is used to periodically trim configurations that are determined
to be mathematically infeasible.

Symbolic terms

The final component of symbolic execution consists of the task of introducing
symbolic terms into the configuration. This can be done one of two different
ways. First, the term being passed to krun can actually be symbolic. This
is less frequently used because it requires the user to construct an AST
that contains variables, something which our current parsing capabilities are
not well-equipped to do. The second, more common, way of introducing symbolic
terms into a configuration consists of writing rules where there exists an
existentially qualified variable on the right-hand side of the rule that does
not exist on the left-hand side of the rule.

In order to prevent users from writing such rules by accident, K requires
that such variables begin with the ? prefix. For example, here is a rule
that rewrites a constructor foo to a symbolic integer:

rule <k> foo => ?X:Int ...</k>

When this rule applies, a fresh variable is introduced to the configuration, which
then is unified against the rules that might apply in order to symbolically
execute that configuration.

ensures clauses

We also introduce here a new feature of K rules that applies when a rule
has this type of variable on the right-hand side: the ensures clause.
An ensures clause is similar to a requires clause and can appear after
a rule body, or after a requires clause. The ensures clause is used to
introduce constraints that might apply to the variable that was introduced by
that rule. For example, we could write the rule above with the additional
constraint that the symbolic integer that was introduced must be less than
five, by means of the following rule:

rule <k> foo => ?X:Int ...</k> ensures ?X <Int 5

Putting it all together

Putting all these pieces together, it is possible to use the Haskell Backend
to perform symbolic reasoning about a particular K module, determining all the
possible states that can be reached by a symbolic configuration.

For example, consider the following K definition (lesson-21.k):

module LESSON-21
    imports INT

    rule <k> 0 => ?X:Int ... </k> ensures ?X =/=Int 0
    rule <k> X:Int => 5  ... </k> requires X >=Int 10
endmodule

When we symbolically execute the program 0, we get the following output
from the Haskell Backend:

    <k>
      5 ~> .
    </k>
  #And
    {
      true
    #Equals
      ?X:Int >=Int 10
    }
  #And
    #Not ( {
      ?X:Int
    #Equals
      0
    } )
#Or
    <k>
      ?X:Int ~> .
    </k>
  #And
    #Not ( {
      true
    #Equals
      ?X:Int >=Int 10
    } )
  #And
    #Not ( {
      ?X:Int
    #Equals
      0
    } )

Note some new symbols introduced by this configuration: #And, #Or, and
#Equals. While andBool, orBool, and ==K represent functions of sort
Bool, #And, #Or, and #Equals are matching logic connectives. We
will discuss matching logic in more detail later in the tutorial, but the basic
idea is that these symbols represent Boolean operators over the domain of
configurations and constraints, as opposed to over the Bool sort.

Notice that the configuration listed above is a disjunction of conjunctions.
This is the most common form of output that can be produced by the Haskell
Backend. In this case, each conjunction consists of a configuration and a set
of constraints. What this conjunction describes, essentially, is a
configuration and a set of information that was derived to be true while
rewriting that configuration.

Similar to how we saw --search in a previous lesson, the reason we have
multiple disjuncts is because there are multiple possible output states
for this program, depending on whether or not the second rule applied. In the
first case, we see that ?X is greater than or equal to 10, so the second rule
applied, rewriting the symbolic integer to the concrete integer 5. In the
second case, we see that the second rule did not apply because ?X is less
than 10. Moreover, because of the ensures clause on the first rule, we know
that ?X is not zero, therefore the first rule will not apply a second time.
If we had omitted this constraint, we would have ended up infinitely applying
the first rule, leading to krun not terminating.

In the next lesson, we will cover how symbolic execution forms the backbone
of deductive program verification in K and how we can use K to prove programs
correct against a specification.

Exercises

  1. Create another rule in LESSON-21 that rewrites odd integers greater than
    ten to a symbolic even integer less than 10 and greater than 0. This rule will
    now apply nondeterministically along with the existing rules. Predict what the
    resulting output configuration will be from rewriting 0 after adding this
    rule. Then run the program and see whether your prediction is correct.

Once you have completed the above exercises, you can continue to
Lesson 1.22: Basics of Deductive Program Verification using K.

Lesson 1.22: Basics of Deductive Program Verification using K

In this lesson, you will familiarize yourself with the basics of using K for
deductive program verification.

1. Setup: Simple Programming Language with Function Calls

We base this lesson on a simple programming language with functions,
assignment, if conditionals, and while loops. Take your time to study its
formalization below (lesson-22.k):

module LESSON-22-SYNTAX
    imports INT-SYNTAX
    imports BOOL-SYNTAX
    imports ID-SYNTAX

    syntax Exp ::= IExp | BExp

    syntax IExp ::= Id | Int

    syntax KResult ::= Int | Bool | Ints

    // Take this sort structure:
    //
    //     IExp
    //    /    \
    // Int      Id
    //
    // Through the List{_, ","} functor.
    // Must add a `Bot`, for a common subsort for the empty list.

    syntax Bot
    syntax Bots ::= List{Bot, ","} [klabel(exps)]
    syntax Ints ::= List{Int, ","} [klabel(exps)]
                  | Bots
    syntax Ids  ::= List{Id, ","}  [klabel(exps)]
                  | Bots
    syntax Exps ::= List{Exp, ","} [klabel(exps), seqstrict]
                  | Ids | Ints

    syntax IExp ::= "(" IExp ")" [bracket]
                  | IExp "+" IExp [seqstrict]
                  | IExp "-" IExp [seqstrict]
                  > IExp "*" IExp [seqstrict]
                  | IExp "/" IExp [seqstrict]
                  > IExp "^" IExp [seqstrict]
                  | Id "(" Exps ")" [strict(2)]

    syntax BExp ::= Bool

    syntax BExp ::= "(" BExp ")" [bracket]
                  | IExp "<=" IExp [seqstrict]
                  | IExp "<"  IExp [seqstrict]
                  | IExp ">=" IExp [seqstrict]
                  | IExp ">"  IExp [seqstrict]
                  | IExp "==" IExp [seqstrict]
                  | IExp "!=" IExp [seqstrict]

    syntax BExp ::= BExp "&&" BExp
                  | BExp "||" BExp

    syntax Stmt ::=
         Id "=" IExp ";" [strict(2)]                        // Assignment
       | Stmt Stmt [left]                                   // Sequence
       | Block                                              // Block
       | "if" "(" BExp ")" Block "else" Block [strict(1)]   // If conditional
       | "while" "(" BExp ")" Block                         // While loop
       | "return" IExp ";"                    [seqstrict]   // Return statement
       | "def" Id "(" Ids ")" Block                         // Function definition

    syntax Block ::=
         "{" Stmt "}"    // Block with statement
       | "{" "}"         // Empty block
endmodule

module LESSON-22
    imports INT
    imports BOOL
    imports LIST
    imports MAP
    imports LESSON-22-SYNTAX

    configuration
      <k> $PGM:Stmt </k>
      <store> .Map </store>
      <funcs> .Map </funcs>
      <stack> .List </stack>

 // -----------------------------------------------
    rule <k> I1 + I2 => I1 +Int I2 ... </k>
    rule <k> I1 - I2 => I1 -Int I2 ... </k>
    rule <k> I1 * I2 => I1 *Int I2 ... </k>
    rule <k> I1 / I2 => I1 /Int I2 ... </k>
    rule <k> I1 ^ I2 => I1 ^Int I2 ... </k>

    rule <k> I:Id => STORE[I] ... </k>
         <store> STORE </store>

 // ------------------------------------------------
    rule <k> I1 <= I2 => I1  <=Int I2 ... </k>
    rule <k> I1  < I2 => I1   <Int I2 ... </k>
    rule <k> I1 >= I2 => I1  >=Int I2 ... </k>
    rule <k> I1  > I2 => I1   >Int I2 ... </k>
    rule <k> I1 == I2 => I1  ==Int I2 ... </k>
    rule <k> I1 != I2 => I1 =/=Int I2 ... </k>

    rule <k> B1 && B2 => B1 andBool B2 ... </k>
    rule <k> B1 || B2 => B1  orBool B2 ... </k>

    rule <k> S1:Stmt S2:Stmt => S1 ~> S2 ... </k>

    rule <k> ID = I:Int ; => . ... </k>
         <store> STORE => STORE [ ID <- I ] </store>

    rule <k> { S } => S ... </k>
    rule <k> {   } => . ... </k>

    rule <k> if (true)   THEN else _ELSE => THEN ... </k>
    rule <k> if (false) _THEN else  ELSE => ELSE ... </k>

    rule <k> while ( BE ) BODY => if ( BE ) { BODY while ( BE ) BODY } else { } ... </k>

    rule <k> def FNAME ( ARGS ) BODY => . ... </k>
         <funcs> FS => FS [ FNAME <- def FNAME ( ARGS ) BODY ] </funcs>

    rule <k> FNAME ( IS:Ints ) ~> CONT => #makeBindings(ARGS, IS) ~> BODY </k>
         <funcs> ... FNAME |-> def FNAME ( ARGS ) BODY ... </funcs>
         <store> STORE => .Map </store>
         <stack> .List => ListItem(state(CONT, STORE)) ... </stack>

    rule <k> return I:Int ; ~> _ => I ~> CONT </k>
         <stack> ListItem(state(CONT, STORE)) => .List ... </stack>
         <store> _ => STORE </store>

    rule <k> return I:Int ; ~> . => I </k>
         <stack> .List </stack>

    syntax KItem ::= #makeBindings(Ids, Ints)
                   | state(continuation: K, store: Map)
 // ----------------------------------------------------
    rule <k> #makeBindings(.Ids, .Ints) => . ... </k>
    rule <k> #makeBindings((I:Id, IDS => IDS), (IN:Int, INTS => INTS)) ... </k>
         <store> STORE => STORE [ I <- IN ] </store>
endmodule

Next, compile this example using kompile lesson-22.k --backend haskell. If
your processor is an Apple Silicon processor, add the --no-haskell-binary
flag if the compilation fails.

2. Setup: Proof Environment

Next, take the following snippet of K code and save it in lesson-22-spec.k.
This is a skeleton of the proof environment, and we will complete it as the
lesson progresses.

requires "lesson-22.k"
requires "domains.md"

module LESSON-22-SPEC-SYNTAX
    imports LESSON-22-SYNTAX

endmodule

module VERIFICATION
    imports K-EQUAL
    imports LESSON-22-SPEC-SYNTAX
    imports LESSON-22
    imports MAP-SYMBOLIC

endmodule

module LESSON-22-SPEC
    imports VERIFICATION

endmodule

3. Claims

  1. The first claim we will ask K to prove is that 3 + 4, in fact, equals 7.
    Claims are stated using the claim keyword, followed by the claim
    statement:
claim <k> 3 + 4 => 7 ... </k>

Add this claim to the LESSON-22-SPEC module and run the K prover using the
command kprove lesson-22-spec.k. You should get back the output #Top,
which denotes the Matching Logic equivalent of true and means, in this
context, that all claims have been proven correctly.

  1. The second claim reasons about the if statement that has a concrete condition:
claim <k> if ( 3 + 4 == 7 ) {
            $a = 1 ;
            } else {
            $a = 2 ;
            }
        => . ... </k>
        <store> STORE => STORE [ $a <- 1 ] </store>

stating that the given program terminates (=> .), and when it does, the value
of the variable $a is set to 1, meaning that the execution will have taken
the then branch. Add this claim to the LESSON-22-SPEC module, but also add

syntax Id ::= "$a" [token]

to the LESSON-22-SPEC-SYNTAX module in order to declare $a as a token so
that it can be used as a program variable. Re-run the K prover, which should
again return #Top.

  1. Our third claim demonstrates how to reason about both branches of an if
    statement at the same time:
claim <k> $a = A:Int ; $b = B:Int ;
          if ($a < $b) {
            $c = $b ;
          } else {
            $c = $a ;
          }
        => . ... </k>
        <store> STORE => STORE [ $a <- A ] [ $b <- B ] [ $c <- ?C:Int ] </store>
    ensures (?C ==Int A) orBool (?C ==Int B)

The program in question first assigns symbolic integers A and B to program
variables $a and $b, respectively, and then executes the given if
statement, which has a symbolic condition (A < B), updating the value of the
program variable $c in both branches. The specification we give states that
the if statement terminates, with $a and $b updated, respectively, to A
and B, and $c updated to some symbolic integer value ?C. Via the
ensures clause, which is used to specify additional constraints that hold
after execution, we also state that this existentially quantified ?C equals
either A or B.

Add the productions declaring $b and $c as tokens to the
LESSON-22-SPEC-SYNTAX module, the claim to the LESSON-22-SPEC module, run
the K prover again, and observe the output, which should not be #Top this
time. This means that K was not able to prove the claim, and we now need to
understand why. We do so by examining the output, which should look as follows:

    (InfoReachability) while checking the implication:
    The configuration's term unifies with the destination's term,
    but the implication check between the conditions has failed.

  #Not (
    #Exists ?C . {
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- ?C:Int ]
      #Equals
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
    }
  #And
    {
      true
    #Equals
      ?C ==Int A orBool ?C ==Int B
    }
  )
#And
  <generatedTop>
    <k>
      _DotVar1
    </k>
    <store>
      STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
    </store>
    <funcs>
      _Gen3
    </funcs>
    <stack>
      _Gen5
    </stack>
  </generatedTop>
#And
  {
    true
  #Equals
    A <Int B
  }

This output starts with a message telling us at which point the proof failed,
followed by the final state, which consists of three parts: some negative
Matching Logic (ML) constraints, the final configuration (<generatedTop> ... </generatedTop>), and some positive ML constraints. Generally speaking,
these positive and the negative constraints could arise from various sources,
such as (but not limited to) branches taken by the execution
(e.g. { true #Equals A <Int B } or #Not ( { true #Equals A <Int B } )),
or ensures constraints.

First, we examine the message:

(InfoReachability) while checking the implication:
The configuration's term unifies with the destination's term,
but the implication check between the conditions has failed.

which tells us that the structure of the final configuration is as expected,
but that some of the associated constraints cannot be proven. We next look at
the final configuration, in which the relevant item is the <store> ... </store> cell, because it is the only one that we are reasoning about. By
inspecting its contents:

STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]

we see that we should be within the constraints of the ensures, since the
value of $c in the store equals B in this branch. We next examine the
negative and positive constraints of the output and, more often than not, the
goal is to instruct K how to use the information from the final configuration
and the positive constraints to falsify one of the negative constraints. This
is done through simplifications.

So, the positive constraint that we have is

{ true #Equals A <Int B }

meaning that A <Int B holds. Given the analysed program, this tells us that
we are in the then branch of the if. The negative constraint is

  #Not (
    #Exists ?C . {
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- ?C:Int ]
      #Equals
        STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
    }
  #And
    { true #Equals ?C ==Int A orBool ?C ==Int B }
  )

and we observe, from the first equality, that the existential ?C should be
instantiated with B. This would make both branches of the #And true,
falsifying the outside #Not. We just need to show K how to conclude that
?C ==Int B. We do so by introducing the following simplification into the
VERIFICATION module:

rule { M:Map [ K <- V ] #Equals M [ K <- V' ] } => { V #Equals V' } [simplification]

which formalizes our internal understanding of ?C ==Int B. The rule states
that when we update the same key in the same map with two values, and the
resulting maps are equal, then the two values must be equal as well. The
[simplification] attribute indicates to K to use this rule to simplify the
state when trying to prove claims. Like function rules, simplification rules
do not complete to the top of the configuration, but instead apply anywhere
their left-hand-side matches. Re-run the K prover, which should now return
#Top, indicating that K was able to use the simplification and prove the
required claims.

  1. Next, we show how to state and prove properties of while loops. In
    particular, we consider the following loop
claim
    <k>
        while ( 0 < $n ) {
            $s = $s + $n;
            $n = $n - 1;
            } => . ...
    </k>
    <store>
        $s |-> (S:Int => S +Int ((N +Int 1) *Int N /Int 2))
        $n |-> (N:Int => 0)
    </store>
    requires N >=Int 0

which adds the sum of the first $n integers to $s, assuming the value of $n
is non-negative to begin with. This is reflected in the store by stating that,
after the execution of the loop, the original value of $s (which is set to
equal some symbolic integer S) is incremented by ((N +Int 1) *Int N /Int 2), and the value of $n always equals 0. Add $n and $s as tokens in
the LESSON-22-SPEC-SYNTAX module, the above claim to the LESSON-22-SPEC
module, and run the K prover, which should return #Top.

  1. Finally, our last claim is about a program that uses function calls:
claim
    <k>
        def $sum($n, .Ids) {
            $s = 0 ;
            while (0 < $n) {
                $s = $s + $n;
                $n = $n - 1;
            }
            return $s;
        }

        $s = $sum(N:Int, .Ints);
    => . ... </k>
    <funcs> .Map => ?_ </funcs>
    <store> $s |-> (_ => ((N +Int 1) *Int N /Int 2)) </store>
    <stack> .List </stack>
    requires N >=Int 0

Essentially, we have wrapped the while loop from claim 3.4 into a function
$sum, and then called that function with a symbolic integer N, storing the
return value in the variable $s. The specification states that this program
ends up storing the sum of the first N integers in the variable $n. Add $sum
to the LESSON-22-SPEC-SYNTAX module, the above claim to the
LESSON-22-SPEC module, and run the K prover, which should again return
#Top.

Exercises

  1. Change the condition of the if statement in part 3.2 to take the else
    branch and adjust the claim so that the proof passes.

  2. The post-condition of the specification in part 3.3 loses some information.
    In particular, the value of ?C is in fact the maximum of A and B.
    Prove the same claim as in 3.2, but with the post-condition ensures (?C ==Int maxInt(A, B)). For this, you will need to extend the VERIFICATION
    module with two simplifications that capture the meaning of maxInt(A:Int, B:Int). Keep in mind that any rewriting rule can be used as a
    simplification; in particular, that simplifications can have requires
    clauses.

  3. Following the pattern shown in part 3.4, assuming a non-negative initial
    value of $b, specify and verify the following while loop:

while ( 0 < $b ) {
    $a = $a + $c;
    $b = $b - 1;
    $c = $c - 1;
}

Hint: You will not need additional simplifications---once you've got the
specification right, the proof will go through.

  1. Write an arbitrary yet not-too-complex function (or several functions
    interacting with each other), and try to specify and verify it (them) in K.

Section 2: Intermediate K Concepts

The goal of this second section is to supplement a beginning developer's
knowledge of K after they have gained a basic understanding of K. Each lesson
in this section can be completed independently in order to learn about a
particular facet of the K language. The lessons are written to provide basic
understanding of less commonly-used features of K to someone who is still
learning K. For more complete references of these features, the reader ought to
consult the User Manual.

The reader ought to be able to complete lessons in this section as needed in
order to learn about specific features of interest, but if desired, can also
complete the entire section in one go. Someone who has completed this entire
section ought to be able to read and understand most K specifications, as well
as write their own specifications of some complexity, and use them to perform
most common K-related tasks. They can then read about specific lessons in
Section 3: Advanced K Concepts if they want to
learn more.

Table of Contents

  1. Macros, Aliases, and Anywhere Rules
  2. Fresh Constants
  3. KLabels and Abstract Syntax
  4. Overloaded Symbols
  5. Matching Logic Connectives and #Or Patterns
  6. Function Context
  7. Record Productions and Named Nonterminals
  8. #fun and #let
  9. #as patterns
  10. The Matching Operators, :=K and :/=K
  11. Uncommon Evaluation Order Concepts
  12. IEEE 754 Floating Point and Fixed Width Integers
  13. Alpha-renaming-aware Substitution
  14. File I/O
  15. String Buffers and Byte Sequences
  16. The Intermediate Language of K, KORE
  17. Debugging Proofs using the Haskell Backend REPL

Lesson 2.1: Macros, Aliases, and Anywhere Rules

The purpose of this lesson is to explain the behavior of the macro,
macro-rec, alias, and alias-rec production attributes, as well as the
anywhere rule attribute. These attributes control the meaning of how rules
associated with them are applied.

Macros

Thus far in the K tutorial, we have described three different types of rules:

  1. Top-level rewrite rules, which rewrite a configuration composed of cells to
    another configuration;
  2. Function rules, which define the behavior of a function written over
    arbitrary input and output types; and
  3. Simplification rules, which describe ways in which the symbolic execution
    engine ought to simplify terms containing symbolic values.

This lesson introduces three more types of rules, the first of which are
macros. A production is a macro if it has the macro attribute, and all
rules whose top symbol on the left hand side is a macro are macro rules
which define the behavior of the macro. Like function rules and simplification
rules, macro rules do not participate in cell completion. However, unlike
function rules and simplification rules, macro rules are applied statically
before rewriting begins, and the macro symbol is expected to no longer appear
in the initial configuration for rewriting once all macros in that
configuration are rewritten.

The rationale behind macros is they allow you to define one piece of syntax
in terms of another piece of syntax without any runtime overhead associated
with the cost of rewriting one to the other. This process is a common one in
programming language design and specification and is referred to as
desugaring; The syntax that is transformed is typically also referred to as
syntactic sugar for another type of syntax. For example, in a language with
if statements and curly braces, you could write the following fragment
(lesson-01.k):

module LESSON-01
  imports BOOL

  syntax Stmt ::= "if" "(" Exp ")" Stmt             [macro]
                | "if" "(" Exp ")" Stmt "else" Stmt
                | "{" Stmts "}"
  syntax Stmts ::= List{Stmt,""}
  syntax Exp ::= Bool

  rule if ( E ) S => if ( E ) S else { .Stmts }
endmodule

In this example, we see that an if statement without an else clause is
defined in terms of one with an else clause. As a result, we would only
need to give a single rule for how to rewrite if statements, rather than
two separate rules for two types of if statements. This is a common pattern
for dealing with program syntax that contains an optional component to it.

It is worth noting that by default, macros are not applied recursively. To be
more precise, by default a macro that arises as a result of the expansion of
the same macro is not rewritten further. This is primarily to simplify the
macro expansion process and reduce the risk that improperly defined macros will
lead to non-terminating behavior.

It is possible, however, to tell K to expand a macro recursively. To do this,
simply replace the macro attribute with the macro-rec attribute. Note that
K does not do any kind of checking to ensure termination here, so it is
important that rules be defined correctly to always terminate, otherwise the
macro expansion phase will run forever. Fortunately, in practice it is very
simple to ensure this property for most of the types of macros that are
typically used in real-world semantics.

Exercise

Using a Nat sort containing the constructors 0 and S (i.e., a
Peano-style axiomatization of the
natural numbers where S(N) = N + 1, S(S(N)) = N + 2, etc), write a macro
that will compute the sum of two numbers.

Aliases

NOTE: This lesson introduces the concept of "aliases", which are a variant
of macros. While similar, this is different from the concept of "aliases" in
matching logic, which is introduced in Lesson 2.16.

Macros can be very useful in helping you define a programming language.
However, they can be disruptive while pretty printing a configuration. For
example, you might write a set of macros that transforms the code the user
wrote into equivalent code that is slightly harder to read. This can make it
more difficult to understand the code when it is pretty printed as part of the
output of rewriting.

K defines a relatively straightforward but novel solution to this problem,
which is known as a K alias. An alias in K is very similar to a macro,
with the exception that the rewrite rule will also be applied backwards
during the pretty-printing process.

It is very simple to make a production be an alias instead of a macro: simply
use the alias or alias-rec attributes instead of the macro or macro-rec
attributes. For example, if the example involving if statements above was
declared using an alias instead of a macro, the Stmt term if (E) {} else {}
would be pretty-printed as if (E) {}. This is because during pretty-printing,
the term participates in another macro-expansion pass. However, this macro
expansion step will only apply rules with the alias or alias-rec attribute,
and, critically, it will reverse the rule by treating the left-hand side as if
it were the right-hand side, and vice versa.

This can be very useful to allow you to define one construct in terms of
another while still being able to pretty-print the result as if it were
the original term in question. This can be especially useful for applications
of K where we are taking the output of rewriting and attempting to use it as
a code fragment that we then execute, such as with test generation.

Exercise

Modify LESSON-01 above to use an alias instead of a macro and experiment
with how various terms are pretty-printed by invoking krun on them.

anywhere rules

The last type of rule introduced in this lesson is the anywhere rule. An
anywhere rule is specified by adding the anywhere attribute to a rule. Such a
rule is similar to a function rule in that it does not participate in cell
completion, and will apply anywhere that the left-hand-side matches in the
configuration, but distinct in that the symbol in question can still be matched
against in the left-hand side of other rules, even during concrete rewriting.
The reasoning behind this is that instead of the symbol in question being a
constructor, it is a constructor modulo the axioms defined with the
anywhere attribute. Essentially, the rules with the anywhere attribute will
apply as soon as they appear in the right-hand side of a rule being applied,
but the symbol in question will still be treated as a symbol that can be
matched on if it is not completely removed by those rules.

This can be useful in certain cases to allow you to define transformations over
particular pieces of syntax while still generally giving those pieces of syntax
another meaning when the anywhere rule does not apply. For example, the ISO C
standard defines the semantics of *&x as exactly equal to x, with no
reading or writing of memory taking place, and the K semantics of C implements
this functionality using an anywhere rule that is applied at compilation time.

NOTE: the anywhere attribute is only implemented on the LLVM backend
currently. Attempting to use it in a semantics that is compiled with the
Haskell backend will result in an error being reported by the compiler. This
should be remembered when using this attribute, as it may not be suitable for
a segment of a semantics which is intended to be symbolically executed.

Exercises

  1. Write a version of the calculator from Lesson 1.14 Exercise 1, which uses
    the same syntax for evaluating expressions, but defines its arithmetic logic
    using anywhere rules rather than top-level rewrite rules.

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.2: Fresh Constants

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.3: KLabels and Abstract Syntax

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.4: Overloaded Symbols

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.5: Matching Logic Connectives and #Or Patterns

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.6: Function Context

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.7: Record Productions and Named Nonterminals

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.8: #fun and #let

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.9: #as Patterns

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.10: The Matching Operators, :=K and :/=K

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.11: Uncommon Evaluation Order Concepts

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.12: IEEE 754 Floating Point and Fixed Width Integers

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.13: Alpha-renaming-aware Substitution

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.14: File I/O

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.15: String Buffers and Byte Sequences

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.16: The Intermediate Language of K, KORE

Return to Top

Click here to return to the Table of Contents for Section 2.

Lesson 2.17: Debugging Proofs using the Haskell Backend REPL

Return to Top

Click here to return to the Table of Contents for Section 2.

K User Manual

NOTE: The K User Manual is still under construction; some features of K
may have partial or missing documentation.

Introduction

Why K?

The K Framework is a programming language and system design toolkit made for
practioners and researchers alike.

K For Practioners:
K is a framework for deriving programming languages tools from their semantic
specifications.

Typically, programming language tool development follows a similar pattern.
After a new programming language is designed, separate teams will develop
separate language tools (e.g. a compiler, interpreter, parser, symbolic
execution engine, etc). Code reuse is uncommon. The end result is that for each
new language, the same basic tools and patterns are re-implemented again and
again.

K approaches the problem differently -- it generates each of these tools from a single language specification.
The work of programming language design and tool implementation are made separate concerns.
The end result is that the exercise of
designing new languages and their associated tooling is now reduced to
developing a single language specification from which we derive our tooling for
free
.

K For Researchers:
K is a configuration- and rewrite-based executable semantic framework.

In more detail, K specifications are:

  1. Executable: compile into runnable and testable programs;
  2. Semantic: correspond to a logical theory with a sound and relatively
    complete proof system;
  3. Configuration-based: organize system states into compositional,
    hierarchical, labelled units called cells;
  4. Rewrite-based: define system transitions using rewrite rules.

K specifications are compiled into particular matching logic theories, giving
them a simple and expressive semantics. K semantic rules are implicitly defined
over the entire configuration structure, but omit unused cells, enabling a
highly modular definitional style. Furthermore, K has been used to develop
programming languages, type systems, and formal analysis tools.

Manual Objectives

As mentioned in the Why K? section above, the K Framework is designed as a
collection of language-generic command-line interface (CLI) tools which revolve
around K specifications. These tools cover a broad range of uses, but they
typically fall into one of the following categories:

  1. Transforming K Specs (e.g. compilation)
  2. Running K Specs (e.g. concrete and symbolic execution)
  3. Analyzing K Specs (e.g. theorem proving)

The main user-facing K tools include:

  • kompile - the K compiler driver
  • kparse - the stanadlone K parser and abstract syntax tree (AST)
    transformation tool
  • krun - the K interpreter and symbolic execution engine driver
  • kprove - the K theorem prover

This user manual is designed to be a tool reference.
In particular, it is not desgined to be a tutorial on how to write K
specifications or to teach the logical foundations of K. New K users should
consult our dedicated
K tutorial,
or the more language-design oriented
PL tutorial.
Researchers seeking to learn more about the logic underlying K are encouraged
to peruse the
growing literature on K and matching logic.
We will consider the manual complete when it provides a complete description of
all user-facing K tools and features.

Introduction to K

Since K specifications are the primary input into the entire system, let us
take a moment to describe them. At the highest level, K specifications describe
a programming language or system using three different pieces:

  1. the system primitives, the base datatypes used during system operation,
    e.g., numbers, lists, maps, etc;
  2. the system state, a tuple or record over system primitives which gives a
    complete snapshot of the system at any given moment;
  3. the system behavior, a set of rules which defines possible system
    evolutions.

K specifications are then defined by a collection of sentences which
correspond to the three concepts above:

  1. syntax declarations encode the system primitives;
  2. configuration declarations encode the system state;
  3. context and rule declarations encode the system behavior.

K sentences are then organized into one or modules which are stored in one or
more files. In this scheme, files may require other files and modules may
import other modules, giving rise to a hierarchy of files and modules. We
give an intuitive sketch of the two levels of grouping in the diagram below:

   example.k file
  +=======================+
  | requires ".." --------|--> File_1
  | ...                   |
  | requires ".." --------|--> File_N
  |                       |
  |  +-----------------+  |
  |  | module ..       |  |
  |  |   imports .. ---|--|--> Module_1
  |  |   ...           |  |
  |  |   imports .. ---|--|--> Module_M
  |  |                 |  |
  |  |   sentence_1    |  |
  |  |   ...           |  |
  |  |   sentence_K    |  |
  |  | endmodule       |  |
  |  +-----------------+  |
  |                       |
  +=======================+

where:

  • files and modules are denoted by double-bordered and single-borded boxes
    respectively;
  • file or module identifiers are denoted by double dots (..);
  • potential repititions are denoted by triple dots (...).

In the end, we require that the file and module hierarchies both form a
directed acyclic graph (DAG). This is, no file may recursively require itself,
and likewise, no module may recursively import itself.

We now zoom in further to discuss the various kinds of sentences contained in K
specifications:

  1. sentences that define our system's primitives, including:

    • sort declarations: define new categories of primitive datatypes
    • Backus-Naur Form (BNF) grammar declarations: define the
      operators that inhabit our primitive datatypes
    • lexical syntax declarations: define lexemes/tokens for the
      lexer/tokenizer
    • syntax associativity declarations: specify the
      associativity/grouping of our declared operators
    • syntax priority declarations: specify the priority of
      potential ambiguous operators
  2. sentences that define our system's state, including:

    • configuration declarations: define labelled, hierarchical records
      using an nested XML-like syntax
  3. sentences that define our system's behavior, including:

    • context declarations: describe how primitives and configurations
      can simplify
    • context alias declarations: define templates that can generate new
      contexts
    • rule declarations: define how the system transitions from one state
      to the next

K Process Overview

We now examine how the K tools are generally used. The main input to all of the
K tools is a K specification. For effieciency reasons, this specification is
first compiled into an intermediate representation called Kore. Once we have
obtained this intermediate representation, we can use it to do:

  1. parsing/pretty-printing, i.e., converting a K term, whose syntax is defined
    by a K specification, into a alternate representation
  2. concrete and abstract execution of a K specification
  3. theorem proving, i.e., verifying whether a set of claims about a K
    specification hold

We represent the overall process using the graphic below:

 K Compilation Process
+============================================================+
|                     +---------+                            |
|  K Specification ---| kompile |--> Kore Specification --+  |
|                     +---------+                         |  |
+=========================================================|==+
                                                          |
 K Execution Process                                      |
+=========================================================|==+
|                                                         |  |
|             +-------------------------------------------+  |
|             |                                              |
|             |       +---------+                            |
|  K Term ----+-------| kparse  |--> K Term                  |
|             |       +---------+                            |
|             |                                              |
|             |       +---------+                            |
|  K Term ----+-------|  krun   |--> K Term                  |
|             |       +---------+                            |
|             |                                              |
|             |       +---------+                            |
|  K Claims --+-------| kprove  |--> K Claims                |
|                     +---------+                            |
|                                                            |
+============================================================+

where:

  • process outlines are denoted by boxes with double-lined borders
  • executables are denoted by boxes with single-lined borders
  • inputs and outputs are denoted by words attached to lines
  • K terms typically correspond to programs defined in a particular
    language's syntax (which are either parsed using kparse or executed using
    krun)
  • K claims are a notation for describing how certain K programs should
    execute (which are checked by our theorem prover kprove)

K Compilation Process:
Let us start with a description of the compilation process. According to the
above diagram, the compiler driver is called kompile. For our purposes, it is
enough to view the K compilation process as a black box that transforms a K
specification into a lower-level Kore specification that encodes the same
information, but that is easier to work with programmatically.

K Execution Process:
We now turn our attention to the K execution process. Abstractly, we can divide
the K execution process into the following stages:

  1. the kore specification is loaded (which defines a lexer, parser, and
    unparser among other things)
  2. the input string is lexed into a token stream
  3. the token stream is parsed into K terms/claims
  4. the K term/claims are transformed according the K tool being used (e.g.
    kparse, krun, or kprove)
  5. the K term/claims are unparsed into a string form and printed

Note that all of the above steps performed in K execution process are fully
prescribed by the input K specification. Of course, there are entire languages
devoted to encoding these various stages proces individually, e.g., flex for
lexers, bison for parsers, etc. What K offers is a consistent language to
package the above concepts in a way that we believe is convenient and practical
for a wide range of uses.

Module Declaration

K modules are declared at the top level of a K file. They begin with the
module keyword and are followed by a module ID and an optional set of
attributes. They continue with zero or more imports and zero or more sentences
until the endmodule keyword is reached.

A module ID consists of an optional # at the beginning, followed by one or
more components separated by hyphens. Each component can contain letters,
numbers, or underscores.

After the module ID, attributes can be specified in square brackets. See below
for an (incomplete) list of allowed module attributes.

Following the attributes, a module can contain zero or more imports. An
import consists of the import or imports keywords followed by a module ID.
An import tells the compiler that this module should contain all the sentences
(recursively) contained by the module being imported.

Imports can be public or private. By default, they are public, which
means that all the imported syntax can be used by any module that imports the
module doing the import. However, you can explicitly override the visibility
of the import with the public or private keyword immediately prior to the
module name. A module imported privately does not export its syntax to modules
that import the module doing the import.

Following imports, a module can contain zero or more sentences. A sentence can
be a syntax declaration, a rule, a configuration declaration, a context, a
claim, or a context alias. Details on each of these can be found in subsequent
sections.

private attribute

If the module is given the private attribute, all of its imports and syntax
are private by default. Individual pieces of syntax can be made public with
the public attribute, and individual imports can be made public with the
public keyword. See relevant sections on syntax and modules for more details
on what it means for syntax and imports to be public or private.

symbolic and concrete attribute

These attributes may be placed on modules to indicate that they should only
be used by the Haskell and LLVM backends respectively. If the definition is
compiled on the opposite backend, they are implicitly removed from the
definition prior to parsing anywhere they are imported. This can be useful when
used in limited capacity in order to provide alternate semantics for certain
features on different backends. It should be used sparingly as it makes it more
difficult to trust the correctness of your semantics, even in the presence of
testing.

Syntax Declaration

Named Non-Terminals

We have added a syntax to Productions which allows non-terminals to be given a
name in productions. This significantly improves the ability to document K, by
providing a way to explicitly explain what a field in a production corresponds
to instead of having to infer it from a comment or from the rule body.

The syntax is:

name: Sort

This syntax can be used anywhere in a K definition that expects a non-terminal.

symbol(_) attribute

By default, when compiling a definition, K generates a unique "mangled" label
identifier for each syntactic production. These identifiers can be used to
reference productions externally, for example when constructing terms by hand
or programmatically via Pyk.

The symbol(_) attribute can be applied to a production to control the precise
identifier for a production that appears in a compiled definition. For example:

module SYMBOLS
    syntax Foo ::= foo() [symbol(foo)]
                 | bar()
endmodule

Here, the compiled definition will contain the following symbol declarations:

  symbol Lblfoo{}() ...
  symbol Lblbar'LParRParUnds'SYMBOLS'Unds'Foo{}() ...

The compiler enforces uniqueness[1] of symbol names specified in
this way; it would be an error to apply symbol(foo) to another production in
the module above. Additionally, symbol(_) with an argument may not co-occur
with the klabel(_) attribute (see below).

overload attribute

K supports subsort overloading[2] on symbols, whereby a
constructor can have a more specific sort for certain arguments. For example,
consider the following productions derived from a C-like language semantics:

syntax Exp  ::= LVal
              | Exp  "." Id
syntax LVal ::= LVal "." Id

Here, it is useful for the result of the dot operator to be an LVal if the
left-hand side is itself an LVal. However, there is an issue with the code
as written: if L() is a term of sort LVal, then the program L() . x has a
parsing ambiguity between the two productions for the dot operator. To resolve
this, we can mark the productions as overloads:

syntax Exp  ::= LVal
              | Exp  "." Id [overload(_._)]
syntax LVal ::= LVal "." Id [overload(_._)]

Now, the parser will select the most specific overloaded production when it
resolves ambiguities in L() . x (that is, L() . x parses to a term of sort
LVal.

Formally, the compiler organises productions into a partial order that defines
the overload relation as follows. We say that P is a more specific overload
of Q if:

  • P and Q have the same overload(_) attribute. Note that the argument
    supplied has no semantic meaning other than as a key grouping productions
    together.
  • Let S_P be the sort of P, and S_p1 etc. be the sorts of its arguments
    (c.f. for Q). The tuple (S_P, S_p1, ..., S_pN) must be elementwise
    strictly less than (S_Q, S_q1, ..., S_qN) according to the definition's
    subsorting relationship. That is, a term from production P is a restriction
    of one from production Q; when its arguments are more precise, we can give
    the result a more precise sort.

klabel(_) and symbol attributes

Note: the klabel(_), symbol approach described in this section is a legacy
feature that will be removed in the future. New code should use the symbol(_)
and overload(_) attributes to opt into explicit naming and overloading
respectively.

References here to "overloading" are explained in the section above; the use
of the klabel(_) attribute without symbol is equivalent to the new
overload(_) syntax.

By default K generates for each syntax definition a long and obfuscated klabel
string, which serves as a unique internal identifier and also is used in kast
format of that syntax. If we need to reference a certain syntax production
externally, we have to manually define the klabels using the klabel attribute.
One example of where you would want to do this is to be able to refer to a given
symbol via the syntax priority attribute, or to enable overloading of a
given symbol.

If you only provide the klabel attribute, you can use the provided klabel to
refer to that symbol anywhere in the frontend K code. However, the internal
identifier seen by the backend for that symbol will still be the long obfuscated
generated string. Sometimes you want control over the internal identifier used as
well, in which case you use the symbol attribute. This tells the frontend to
use whatever the declared klabel is directly as the internal identifier.

For example:

module MYMODULE
    syntax FooBarBaz ::= #Foo( Int, Int ) [klabel(#Foo), symbol] // symbol1
                       | #Bar( Int, Int ) [klabel(#Bar)]         // symbol2
                       | #Baz( Int, Int )                        // symbol3
endmodule

Here, we have that:

  • In frontend K, you can refer to "symbol1" as #Foo (from klabel(#Foo)),
    and the backend will see 'Hash'Foo as the symbol name.
  • In frontend K, you can refer to "symbol2" as #Bar (from klabel(#Bar)),
    and the backend will see
    'Hash'Bar'LParUndsCommUndsRParUnds'MYMODULE'Unds'FooBarBaz'Unds'Int'Unds'Int
    as the symbol name.
  • In frontend K, you can refer to "symbol3" as
    #Baz(_,_)_MYMODULE_FooBarBaz_Int_Int (from auto-generated klabel), and
    the backend will see
    'Hash'Baz'LParUndsCommUndsRParUnds'MYMODULE'Unds'FooBarBaz'Unds'Int'Unds'Int
    as the symbol name.

The symbol provided must be unique to this definition. This is enforced by
K. In general, it's recommended to use the symbol attribute whenever you use
klabel unless you explicitly have a reason not to (e.g. you want to overload
symbols, or you're using a deprecated backend). It can be very helpful use the
symbol attribute for debugging, as many debugging messages are printed in
Kast format which will be more readable with the symbol names you explicitly
declare. In addition, if you are programatically manipulating definitions via
the JSON Kast format, building terms using the user-provided pretty
symbol, klabel(...) is easier and less error-prone if the auto-generation
process for klabels changes.

Syntactic Lists

When using K's support for syntactic lists, a production like:

syntax Ints ::= List{Int, ","} [symbol(ints)]

will desugar into two productions:

syntax Ints ::= Int "," Ints [symbol(ints)]
syntax Ints ::= ".Ints"      [symbol(List{"ints"})]

Note that the symbol for the terminator of the list has been generated
automatically from the label on the original production. It is possible to
control what the terminator's label is using the terminator-symbol(_)
attribute. For example:

syntax Ints ::= List{Int, ","} [symbol(ints), terminator-symbol(.ints)]

will desugar into two productions:

syntax Ints ::= Int "," Ints [symbol(ints)]
syntax Ints ::= ".Ints"      [symbol(.ints)]

It is an error to apply terminator-symbol(_) to a non-production sentence, or
to a production that does not declare a syntactic list.

Parametric productions and bracket attributes

Some syntax productions, like the rewrite operator, the bracket operator, and
the #if #then #else #fi operator, cannot have their precise type system
expressed using only concrete sorts.

Prior versions of K solved this issue by using the K sort in this case, but
this introduces inexactness in which poorly typed terms can be created even
without having a cast operator present in the syntax, which is a design
consideration we would prefer to avoid.

It also introduces cases where terms cannot be placed in positions where they
ought to be well sorted unless their return sort is made to be KBott, which in
turn vastly complicates the grammar and makes parsing much slower.

In order to introduce this, we provide a new syntax for parametric productions
in K. This allows you to express syntax that has a sort signature based on
parametric polymorphism. We do this by means of an optional curly-brace-
enclosed list of parameters prior to the return sort of a production.

Some examples:

syntax {Sort} Sort ::= "(" Sort ")" [bracket]
syntax {Sort} KItem ::= Sort
syntax {Sort} Sort ::= KBott
syntax {Sort} Sort ::= Sort "=>" Sort
syntax {Sort} Sort ::= "#if" Bool "#then" Sort "#else" Sort "#fi"
syntax {Sort1, Sort2} Sort1 ::= "#fun" "(" Sort2 "=>" Sort1 ")" "(" Sort2 ")"

Here we have:

  1. Brackets, which can enclose any sort but should be of the same sort that was
    enclosed.
  2. Every sort is a KItem.
  3. A KBott term can appear inside any sort.
  4. Rewrites, which can rewrite a value of any sort to a value of the same sort.
    Note that this allows the lhs or rhs to be a subsort of the other.
  5. If then else, which can return any sort but which must contain that sort on
    both the true and false branches.
  6. lambda applications, in which the argument and parameter must be the same
    sort, and the return value of the application must be the same sort as the
    return value of the function.

Note the last case, in which two different parameters are specified separated
by a comma. This indicates that we have multiple independent parameters which
must be the same each place they occur, but not the same as the other
parameters.

In practice, because every sort is a subsort of K, the Sort2
parameter in #6 above does nothing during parsing. It cannot
actually reject any parse, because it can always infer that the sort of the
argument and parameter are K, and it has no effect on the resulting sort of
the term. However, it will nevertheless affect the kore generated from the term
by introducing an additional parameter to the symbol generated for the term.

function and total attributes

Many times it becomes easier to write a semantics if you have "helper"
functions written which can be used in the RHS of rules. The function
attribute tells K that a given symbol should be simplified immediately when it
appears anywhere in the configuration. Semantically, it means that evaluation
of that symbol will result in at most one return value (that is, the symbol is
a partial function).

The total attribute indicates that a symbol cannot be equal to matching logic
bottom; in other words, it has at least one value for every possible set of
arguments. It can be added to a production with the function attribute to
indicate to the symbolic reasoning engine that a given symbol is a
total function, that is it has exactly one return value for every possible
input. Other uses of the total attribute (i.e., on multi-valued symbols to
indicate they always have at least one value) are not yet implemented.

For example, here we define the _+Word_ total function and the _/Word_
partial function, which can be used to do addition/division modulo
2 ^Int 256. These functions can be used anywhere in the semantics where
integers should not grow larger than 2 ^Int 256. Notice how _/Word_ is
not defined when the denominator is 0.

syntax Int ::= Int "+Word" Int [function, total]
             | Int "/Word" Int [function]

rule I1 +Word I2 => (I1 +Int I2) modInt (2 ^Int 256)
rule I1 /Word I2 => (I1 /Int I2) modInt (2 ^Int 256) requires I2 =/=Int 0

freshGenerator attribute

In K, you can access "fresh" values in a given domain using the syntax
!VARNAME:VarSort (with the !-prefixed variable name). This is supported for
builtin sorts Int and Id already. For example, you can generate fresh
memory locations for declared identifiers as such:

rule <k> new var x ; => . ... </k>
     <env> ENV => ENV [ x <- !I:Int ] </env>
     <mem> MEM => MEM [ !I <- 0     ] </mem>

Each time a !-prefixed variable is encountered, a new integer will be used,
so each variable declared with new var _ ; will get a unique position in the
<mem>.

Sometimes you want to have generation of fresh constants in a user-defined
sort. For this, K will still generate a fresh Int, but can use a converter
function you supply to turn it into the correct sort. For example, here we can
generate fresh Foos using the freshFoo(_) function annotated with
freshGenerator.

syntax Foo ::= "a" | "b" | "c" | d ( Int )

syntax Foo ::= freshFoo ( Int ) [freshGenerator, function, total]

rule freshFoo(0) => a
rule freshFoo(1) => b
rule freshFoo(2) => c
rule freshFoo(I) => d(I) [owise]

rule <k> new var x ; => . ... </k>
     <env> ENV => ENV [ x <- !I:Int  ] </env>
     <mem> MEM => MEM [ !I <- !F:Foo ] </mem>

Now each newly allocated memory slot will have a fresh Foo placed in it.

token attribute

The token attribute signals to the Kore generator that the associated sort
will be inhabited by domain values. Sorts inhabited by domain values must not
have any constructors declared.

syntax Bytes [hook(BYTES.Bytes), token]

Converting between [token] sorts

You can convert between tokens of one sort via Strings by defining functions
implemented by builtin hooks.
The hook STRING.token2string allows conversion of any token to a string:

syntax String ::= FooToString(Foo)  [function, total, hook(STRING.token2string)]

Similarly, the hook STRING.string2Token allows the inverse:

syntax Bar ::= StringToBar(String) [function, total, hook(STRING.string2token)]

WARNING: This sort of conversion does NOT do any sort of parsing or validation.
Thus, we can create arbitary tokens of any sort:

StringToBar("The sun rises in the west.")

Composing these two functions lets us convert from Foo to Bar

syntax Bar ::= FooToBar(Foo) [function]
rule FooToBar(F) => StringToBar(FooToString(F))

Parsing comments, and the #Layout sort

Productions for the #Layout sort are used to describe tokens that are
considered "whitespace". The scanner removes tokens matching these productions
so they are not even seen by the parser. Below, we use it to define
lines begining with ; (semicolon) as comments.

syntax #Layout ::= r"(;[^\\n\\r]*)"    // Semi-colon comments
                 | r"([\\ \\n\\r\\t])" // Whitespace

prec attribute

Consider the following naive attempt at creating a language what syntax that
allows two types of variables: names that contain underbars, and names that
contain sharps/hashes/pound-signs:

syntax NameWithUnderbar ::= r"[a-zA-Z][A-Za-z0-9_]*"  [token]
syntax NameWithSharp    ::= r"[a-zA-Z][A-Za-z0-9_#]*" [token]
syntax Pgm ::= underbar(NameWithUnderbar)
             | sharp(NameWithSharp)

Although, it seems that K has enough information to parse the programs
underbar(foo) and sharp(foo) with, the lexer does not take into account
whether a token is being parsed for the sharp or for the underbar
production. It chooses an arbitary sort for the token foo (perhaps
NameWithUnderbar). Thus, during paring it is unable to construct a valid term
for one of those programs (sharp(foo)) and produces the error message:
Inner Parser: Parse error: unexpected token 'foo'.

Since calculating inclusions and intersections between regular expressions is
tricky, we must provide this information to K. We do this via the prec(N)
attribute. The lexer will always prefer longer tokens to shorter tokens.
However, when it has to choose between two different tokens of equal length,
token productions with higher precedence are tried first. Note that the default
precedence value is zero when the prec attribute is not specified.

For example, the BUILTIN-ID-TOKENS module defines #UpperId and #LowerId with
the prec(2) attribute.

  syntax #LowerId ::= r"[a-z][a-zA-Z0-9]*"                    [prec(2), token]
  syntax #UpperId ::= r"[A-Z][a-zA-Z0-9]*"                    [prec(2), token]

Furthermore, we also need to make sorts with more specific tokens subsorts of ones with more
general tokens. We add the token attribute to this production so that all
tokens of a particular sort are marked with the sort they are parsed as and not a
subsort thereof. e.g. we get underbar(#token("foo", "NameWithUnderbar"))
instead of underbar(#token("foo", "#LowerId"))

imports BUILTIN-ID-TOKENS
syntax NameWithUnderbar ::= r"[a-zA-Z][A-Za-z0-9_]*" [prec(1), token]
                          | #UpperId                [token]
                          | #LowerId                [token]
syntax NameWithSharp ::= r"[a-zA-Z][A-Za-z0-9_#]*" [prec(1), token]
                       | #UpperId                 [token]
                       | #LowerId                 [token]
syntax Pgm ::= underbar(NameWithUnderbar)
             | sharp(NameWithSharp)

unused attribute

K will warn you if you declare a symbol that is not used in any of the rules of
your definition. Sometimes this is intentional, however; in this case, you can
suppress the warning by adding the unused attribute to the production or
cell.

syntax Foo ::= foo() [unused]

configuration <foo unused=""> .K </foo>

deprecated attribute

Symbols can be marked as deprecated by adding the deprecated attribute to
their declaration. If that symbol subsequently appears in the definition (in a
rule, context, context alias or configuration), the compiler will issue a
warning.

syntax Foo ::= foo() [deprecated]
rule foo() => . // warning on this line

Symbol priority and associativity

Unlike most other parser generators, K combines the task of parsing with AST
generation. A production declared with the syntax keyword in K is both a
piece of syntax used when parsing, and a symbol that is used when rewriting.
As a result, it is generally convenient to describe expression grammars using
priority and associativity declarations rather than explicitly transforming
your grammar into a series of nonterminals, one for each level of operator
precedence. Thus, for example, a simple grammar for addition and multiplication
will look like this:

syntax Exp ::= Exp "*" Exp
             | Exp "+" Exp

However, this grammar is ambiguous. The term x+y*z might refer to x+(y*z)
or to (x+y)*z. In order to differentiate this, we introduce a partial
ordering between productions known as priority. A symbol "has tighter priority"
than another symbol if the first symbol can appear under the second, but the
second cannot appear under the first without a bracket. For example, in
traditional arithmetic, multiplication has tighter priority than addition,
which means that x+y*z cannot parse as (x+y)*z because the addition
operator would appear directly beneath the multiplication, which is forbidden
by the priority filter.

Priority is applied individually to each possible ambiguous parse of a term. It
then either accepts or rejects that parse. If there is only a single remaining
parse (after all the other disambiguation steps have happened), this is the
parse that is chosen. If all the parses were rejected, it is a parse error. If
multiple parses remain, they might be resolved by further disambiguation such
as via the prefer and avoid attributes, but if multiple parses remain after
disambiguation finishes, this is an ambiguous parse error, indicating there is
not a unique parse for that term. In the vast majority of cases, this is
an error and indicates that you ought to either change your grammar or add
brackets to the term in question.

Priority is specified in K grammars by means of one of two different
mechanisms. The first, and simplest, simply replaces the | operator in a
sequence of K productions with the > operator. This operator indicates that
everything prior to the > operator (including transitively) binds tighter
than what comes after. For example, a more complete grammar for simple
arithmetic might be:

syntax Exp ::= Exp "*" Exp
             | Exp "/" Exp
             > Exp "+" Exp
             | Exp "-" Exp

This indicates that multiplication and division bind tigher than addition
and subtraction, but that there is no relationship in priority between
multiplication and division.

As you may have noticed, this grammar is also ambiguous. x*y/z might refer to
x*(y/z) or to (x*y)/z. Indeed, if we removed division and subtraction
entirely, the grammar would still be ambiguous: x*y*z might parse as
x*(y*z), or as (x*y)*z. To resolve this, we introduce another feature:
associativity. Roughly, asssociativity tells us how symbols are allowed to nest
within other symbols with the same priority. If a set of symbols is left
associative, then symbols in that set cannot appear as the rightmost child
of other symbols in that set. If a set of symbols is right associative, then
symbols in that set cannot appear as the leftmost child of other symbols in
that set. Finally, if a set of symbols is non-associative, then symbols
in that set cannot appear as the rightmost or leftmost child of other symbols
in that set. For example, in the above example, if addition and subtraction
are left associative, then x+y+z will parse as (x+y)+z and x+y-z will
parse as (x+y)-z (because the other parse will have been rejected).

You might notice that this seems to apply only to binary infix operators. In
fact, the real behavior is slightly more complicated. Priority and
associativity (for technical reasons that go beyond the scope of this document)
really only apply when the rightmost or leftmost item in a production is a
nonterminal. If the rightmost nonterminal is followed by a terminal (or
respectively the leftmost preceded), priority and associativity do not apply.
Thus we can generalize these concepts to arbitrary context-free grammars.

Note that in some cases, this is not the behavior you want. You may actually
want to reject parses even though the leftmost and rightmost item in a
production are terminals. You can accomplish this by means of the
applyPriority attribute. When placed on a production, it tells the parser
which nonterminals of a production the priority filter ought to reject children
under, overriding the default behavior. For example, I might have a production
like syntax Exp ::= foo(Exp, Exp) [applyPriority(1)]. This tells the parser
to reject terms with looser priority binding under the first Exp, but not
the second. By default, with this production, neither position would apply
to the priority filter, because the first and last items of the production
are both terminals.

Associativity is specified in K grammars by means of one of two different
mechanisms. The first, and simplest, adds the associativity of a priority block
of symbols prior to that block. For example, we can remove the remaining
ambiguities in the above grammar like so:

syntax Exp ::= left:
               Exp "*" Exp
             | Exp "/" Exp
             > right:
               Exp "+" Exp
             | Exp "-" Exp

This indicates that multiplication and division are left-associative, ie, after
symbols with higher priority are parsed as innermost, symbols are nested with
the rightmost on top. Addition and subtraction are right associative, which
is the opposite and indicates that symbols are nested with the leftmost on top.
Note that this is similar but different from evaluation order, which also
concerns itself with the ordering of symbols, which is described in the next
section.

You may note we have not yet introduced the second syntax for priority
and associativity. In some cases, syntax for a grammar might be spread across
multiple modules, sometimes for very good reasons with respect to code
modularity. As a result, it becomes infeasible to declare priority and
associativity inline within a set of productions, because the productions
are not contiguous within a single file.

For this purpose, we introduce the equivalent syntax priority,
syntax left, syntax right, and syntax non-assoc declarations. For
example, the above grammar can be written equivalently as:

syntax Exp ::= Exp "*" Exp [group(mult)]
             | Exp "/" Exp [group(div)]
             | Exp "+" Exp [group(add)]
             | Exp "-" Exp [group(sub)]

syntax priority mult div > add sub
syntax left mult div
syntax right add sub

Here, the group(_) attribute is used to create user-defined groups of
sentences. A particular group name collectively refers to the whole set of
sentences within that group. The sets are flattened together, so we could
equivalently have written:

syntax Exp ::= Exp "*" Exp [group(mult)]
             | Exp "/" Exp [group(mult)]
             | Exp "+" Exp [group(add)]
             | Exp "-" Exp [group(add)]

syntax priority mult > add
syntax left mult
syntax right add

Note that syntax [left|right|non-assoc] should not be used to group together
productions with different priorities. For example, this code would be invalid:

syntax priority mult > add
syntax left mult add

Note that there is one other way to describe associativity, but it is
prone to a very common mistake. You can apply the attribute left, right,
or non-assoc directly to a production to indicate that it is, by itself,
left-, right-, or non-associative.

However, this often does not mean what users think it means. In particular:

syntax Exp ::= Exp "+" Exp [left]
             | Exp "-" Exp [left]

is not equivalent to:

syntax Exp ::= left:
               Exp "+" Exp
             | Exp "-" Exp

Under the first, each production is associative with itself, but not each
other. Thus, x+y+z will parse unambiguously as (x+y)+z, but x+y-z will
be ambiguous. However, in the second, x+y-z will parse unambiguously as
(x+y)-z.

Think carefully about how you want your grammar to parse. In general, if you're
not sure, it's probably best to group associativity together into the same
blocks you use for priority, rather than using left, right, or non-assoc
attributes on the productions.

Lexical identifiers

Sometimes it is convenient to be able to give a certain regular expression a
name and then refer to it in one or more regular expression terminals. This
can be done with a syntax lexical sentence in K:

syntax lexical Alphanum = r"[0-9a-zA-Z]"

This defines a lexical identifier Alphanum which can be expanded in any
regular expression terminal to the above regular expression. For example, I
might choose to then implement the syntax of identifiers as follows:

syntax Id ::= r"[a-zA-Z]{Alphanum}*" [token]

Here {Alphanum} expands to the above regular expression, making the sentence
equivalent to the following:

syntax Id ::= r"[a-zA-Z]([0-9a-zA-Z])*" [token]

This feature can be used to more modularly construct the lexical syntax of your
language. Note that K does not currently check that lexical identifiers used
in regular expressions have been defined; this will generate an error when
creating the scanner, however, and the user ought to be able to debug what
happened.

assoc, comm, idem, and unit attributes

These attributes are used to indicate whether a collection or a production
is associative, commutative, idempotent, and/or has a unit.
In general, you should not need to apply these attributes to productions
yourself, however, they do have certain special meaning to K. K will generate
axioms related to each of these concepts into your definition for you
automatically. It will also automatically sort associative-commutative
collections, and flatten the indentation of associative collections, when
unparsing.

public and private attribute

K allows users to declare certain pieces of syntax as either public or private.
All syntax is public by default. Public syntax can be used from any module that
imports that piece of syntax. A piece of syntax can be declared private with
the private attribute. This means that that syntax can only be used in the
module in which it is declared; it is not visible from modules that import
that module.

You can also change the default visibility of a module with the private
attribute, when it is placed directly on a module. A module with the private
attribute has all syntax private by default; this can be overridden on
specific sentences with the public attribute.

Note that the private module attribute also changes the default visiblity
of imports; please refer to the appropriate section elsewhere in the manual
for more details.

Here is an example usage:

module WIDGET-SYNTAX

  syntax Widget ::= foo()
  syntax WidgetHelper ::= bar() [private] // this production is not visible
                                          // outside this module
endmodule

module WIDGET [private]
  imports WIDGET-SYNTAX

  syntax Widget ::= fooImpl() // this production is not visible outside this
                              // module

  // this production is visible outside this module
  syntax KItem ::= adjustWidget(Widget) [function, public]
endmodule

Configuration Declaration

exit attribute

A single configuration cell containing an integer may have the "exit"
attribute. This integer will then be used as the return value on the console
when executing the program.

For example:

configuration <k> $PGM:Pgm </k>
              <status-code exit=""> 1 </status-code>

declares that the cell status-code should be used as the exit-code for
invocations of krun. Additionally, we state that the default exit-code is 1
(an error state). One use of this is for writing testing harnesses which assume
that the test fails until proven otherwise and only set the <status-code> cell
to 0 if the test succeeds.

Collection Cells: multiplicity and type attributes

Sometimes a semantics needs to allow multiple copies of the same cell, for
example if you are making a concurrent multi-threading programming language.
For this purpose, K supports the multiplicity and type attributes on cells
declared in the configuration.

multiplicity can take on values * and ?. Declaring multiplicity="*"
indicates that the cell may appear any number of times in a runtime
configuration. Setting multiplicity="?" indicates that the cell may only
appear exactly 0 or 1 times in a runtime configuration. If there are no
configuration variables present in the cell collection, the initial
configuration will start with exactly 0 instances of the cell collection. If
there are configuration variables present in the cell collection, the initial
configuration will start with exactly 1 instance of the cell collection.

type can take on values Set, List, and Map. For example, here we declare
several collecion cells:

configuration <k> $PGM:Pgm </k>
              <sets>  <set  multiplicity="?" type="Set">  0:Int </set>  </sets>
              <lists> <list multiplicity="*" type="List"> 0:Int </list> </lists>
              <maps>
                <map multiplicity="*" type="Map">
                  <map-key> 0:Int </map-key>
                  <map-value-1> "":String </map-value-1>
                  <map-value-2> 0:Int     </map-value-2>
                </map>
              </maps>

Declaring type="Set" indicates that duplicate occurrences of the cell should
be de-duplicated, and accesses to instances of the cell will be nondeterministic
choices (constrained by any other parts of the match and side-conditions).
Similarly, declaring type="List" means that new instances of the cell can be
added at the front or back, and elements can be accessed from the front or back,
and the order of the cells will be maintained. The following are examples of
introduction and elimination rules for these collections:

rule <k> introduce-set(I:Int) => . ... </k>
     <sets> .Bag => <set> I </set> </sets>

rule <k> eliminate-set => I ... </k>
     <sets> <set> I </set> => .Bag </sets>

rule <k> introduce-list-start(I:Int) => . ... </k>
     <lists> (.Bag => <list> I </list>) ... </lists>

rule <k> introduce-list-end(I:Int) => . ... </k>
     <lists> ... (.Bag => <list> I </list>) </lists>

rule <k> eliminate-list-start => I ... </k>
     <lists> (<list> I </list> => .Bag) ... </lists>

rule <k> eliminate-list-end => I ... </k>
     <lists> ... (<list> I </list> => .Bag) </lists>

Notice that for multiplicity="?", we only admit a single <set> instance at
a time. For the type=List cell, we can add/eliminate cells from the from or
back of the <lists> cell. Also note that we use .Bag to indicate the empty
cell collection in all cases.

Declaring type="Map" indicates that the first sub-cell will be used as a
cell-key. This means that matching on those cells will be done as a map-lookup
operation if the cell-key is mentioned in the rule (for performance). If the
cell-key is not mentioned, it will fallback to normal nondeterministic
constrained by other parts of the match and any side-conditions. Note that there
is no special meaning to the name of the cells (in this case <map>,
<map-key>, <map-value-1>, and <map-value-2>). Additionally, any number of
sub-cells are allowed, and the entire instance of the cell collection is
considered part of the cell-value, including the cell-key (<map-key> in this
case) and the surrounding collection cell (<map> in this case).

For example, the following rules introduce, set, retrieve from, and eliminate
type="Map" cells:

rule <k> introduce-map(I:Int) => . ... </k>
     <maps> ... (.Bag => <map> <map-key> I </map-key> ... </map>) ... </maps>

rule <k> set-map-value-1(I:Int, S:String) => . ... </k>
     <map> <map-key> I </map-key> <map-value-1> _ => S </map-value-1> ... </map>

rule <k> set-map-value-2(I:Int, V:Int) => . ... </k>
     <map> <map-key> I </map-key> <map-value-2> _ => V </map-value-2> ... </map>

rule <k> retrieve-map-value-1(I:Int) => S ... </k>
     <map> <map-key> I </map-key> <map-value-1> S </map-value-1> ... </map>

rule <k> retrieve-map-value-2(I:Int) => V ... </k>
     <map> <map-key> I </map-key> <map-value-2> V </map-value-2> ... </map>

rule <k> eliminate-map(I:Int) => . ... </k>
     <maps> ... (<map> <map-key> I </map-key> ... </map> => .Bag) ... </maps>

Note how each rule makes sure that <map-key> cell is mentioned, and we
continue to use .Bag to indicate the empty collection. Also note that
when introducing new map elements, you may omit any of the sub-cells which are
not the cell-key. In case you do omit sub-cells, you must use structural
framing ... to indicate the missing cells, they will receive the default
value given in the configuration ... declaration.

Rule Declaration

Rule Structure

Each K rule follows the same basic structure (given as an example here):

rule LHS => RHS requires REQ ensures ENS [ATTRS]

The portion between rule and requires is referred to as the rule body,
and may contain one or more rewrites (though not nested). Here, the rule body is
LHS => RHS, where LHS and RHS are used as placeholders for the pre- and
post- states. Note that we lose no generality referring to the LHS or the
RHS, even in the presence of multiple rewrites, as the rewrites are pulled to
the top-level anyway.

Next is the requires clause, represented here as REQ. The requires clause is
an additional predicate (function-like term of sort Bool), which is to be
evaluated before applying the rule. If the requires clause does not evaluate to
true, then the rule does not apply.

Finally is the ensures clause, represented here as ENS. The ensures clause
is to be interpreted as a post-condition, and will be automatically added to the
path condition if the rule applies. It may cause the entire term to become
undefined, but the backend will not stop itself from applying the rule in this
case. Note that concrete backends (eg. the LLVM backend) are free to ignore the
ensures clause.

Overall, the transition represented by such a rule is from a state
LHS #And REQ ending in a state RHS #And ENS. When backends apply this rule
as a transition/rewrite, they should:

  • Check if pattern LHS matches (or unifies) with the current term, giving
    substitution alpha.
  • Check if the instantiation alpha(REQ) is valid (or satisfiable).
  • Build the new term alpha(RHS #And ENS), and check if it's satisfiable.

Pattern Matching operator

Sometimes when you want to express a side condition, you want to say that a
rule matches if a particular term matches a particular pattern, or if it
instead does /not/ match a particular pattern.

The syntax in K for this is :=K and :/=K. It has similar meaning to ==K and
=/=K, except that where ==K and =/=K express equality, :=K and =/=K express
model membership. That is to say, whether or not the rhs is a member of the set
of terms expressed by the lhs pattern. Because the lhs of these operators is a
pattern, the user can use variables in the lhs of the operator. However, due to
current limitations, these variables are NOT bound in the rest of the term.
The user is thus encouraged to use anonymous variables only, although this is
not required.

This is compiled by the K frontend down to an efficient pattern matching on a
fresh function symbol.

Anonymous function applications

There are a number of cases in K where you would prefer to be able to take some
term on the RHS, bind it to a variable, and refer to it in multiple different
places in a rule.

You might also prefer to take a variable for which you know some of its
structure, and modify some of its internal structure without requiring you to
match on every single field contained inside that structure.

In order to do this, we introduce syntax to K that allows you to construct
anonymous functions in the RHS of a rule and apply them to a term.

The syntax for this is:

#fun(RuleBody)(Argument)

Note the limitations currently imposed by the implementation. These functions
are not first-order: you cannot bind them to a variable and inject them like
you can with a regular klabel for a function. You also cannot express multiple
rules or multiple parameters, or side conditions. All of these are extensions
we would like to support in the future, however.

In the following, we use three examples to illustrate the behavior of #fun.
We point out that the support for #fun is provided by the frontend, not the
backends.

The three examples are real examples borrowed or modified from existing language
semantics.

Example 1 (A Simple Self-Explained Example).

#fun(V:Val => isFoo(V) andBool isBar(V))(someFunctionReturningVal())

Example 2 (Nested #fun).

   #fun(C
=> #fun(R
=> #fun(E
=> foo1(E, R, C)
  )(foo2(C))
  )(foo3(0))
  )(foo4(1))

This example is from the beacon
semantics:https://github.com/runtimeverification/beacon-chain-spec/blob/master/b
eacon-chain.k at line 302, with some modification for simplicity. Note how
variables C, R, E are bound in the nested #fun.

Example 3 (Matching a structure).

rule foo(K, RECORD) =>
  #fun(record(... field: _ => K))(RECORD)

Unlike previous examples, the LHS of #fun in this example is no longer a
variable, but a structure. It has the same spirit as the first two examples,
but we match the RECORD with a structure record( DotVar, field: X), instead
of a standalone variable. We also use K's local rewrite syntax (i.e., the
rewriting symbol => does not occur at the top-level) to prevent writing
duplicate expressions on the LHS and RHS of the rewriting.

Macros and Aliases

A production can be tagged with the macro, alias, macro-rec, or alias-rec
attributes. In all cases, what this signifies is that this is a macro production.
Macro rules are rules where the top symbol of the left-hand-side are macro
labels. Macro rules are applied statically during compilation on all terms that
they match, and statically before program execution on the initial configuration.
Currently, macro rules are required to not have side conditions, although they
can contain sort checks.

alias rules are also applied statically in reverse prior to unparsing on the
final configuration. Note that a macro rule can have unbound variables in the
right hand side. When such a macro exists, it should be used only on the left
hand side of rules, unless the user is performing symbolic execution and expects
to introduce symbolic terms into the subject being rewritten.

However, when used on the left hand side of a rule, it functions similarly to a
pattern alias, and allows the user to concisely express a reusable pattern that
they wish to match on in multiple places.

For example, consider the following semantics:

syntax KItem ::= "foo" [alias] | "foobar"
syntax KItem ::= bar(KItem) [macro] | baz(Int, KItem)
rule foo => foobar
rule bar(I) => baz(?_, I)
rule bar(I) => I

This will rewrite baz(0, foo) to foo. First baz(0, foo) will be rewritten
statically to baz(0, foobar). Then the non-macro rule will apply (because
the rule will have been rewritten to rule baz(_, I) => I). Then foobar will
be rewritten statically after rewriting finishes to foo via the reverse form
of the alias.

Note that macros do not apply recursively within their own expansion. This is
done so as to ensure that macro expansion will always terminate. If the user
genuinely desires a recursive macro, the macro-rec and alias-rec attributes
can be used to provide this behavior.

For example, consider the following semantics:

syntax Exp ::= "int" Exp ";" | "int" Exps ";" [macro] | Exp Exp | Id
syntax Exps ::= List{Exp,","}

rule int X:Id, X':Id, Xs:Exps ; => int X ; int X', Xs ;

This will expand int x, y, z; to int x; int y, z; because the macro does
not apply the second time after applying the substitution of the first
application. However, if the macro attribute were changed to the macro-rec
attribute, it would instead expand (as the user likely intended) to
int x; int y; int z;.

The alias-rec attribute behaves with respect to the alias attribute the
same way the macro-rec attribute behaves with respect to macro.

anywhere rules

Some rules are not functional, but you want them to apply anywhere in the
configuration (similar to functional rules). You can use the anywhere
attribute on a rule to instruct the backends to make sure they apply anywhere
they match in the entire configuration.

For example, if you want to make sure that some associative operator is always
right-associated anywhere in the configuration, you can do:

syntax Stmt ::= Stmt ";" Stmt

rule (S1 ; S2) ; S3 => S1 ; (S2 ; S3) [anywhere]

Then after every step, all occurrences of _;_ will be re-associated. Note that
this allows the symbol _;_ to still be a constructor, even though it is
simplified similarly to a function.

trusted claims

You may add the trusted attribute to a given claim for the K prover to
automatically add it to the list of proven circularities, instead of trying to
discharge it separately.

Projection and Predicate functions

K automatically generates certain predicate and projection functions from the
syntax you declare. For example, if you write:

syntax Foo ::= foo(bar: Bar)

It will automatically generate the following K code:

syntax Bool ::= isFoo(K) [function]
syntax Foo ::= "{" K "}" ":>Foo" [function]
syntax Bar ::= bar(Foo) [function]

rule isFoo(F:Foo) => true
rule isFoo(_) => false [owise]

rule { F:Foo }:>Foo => F
rule bar(foo(B:Bar)) => B

The first two types of functions are generated automatically for every sort in
your K definition, and the third type of function is generated automatically
for each named nonterminal in your definition. Essentially, isFoo for some
sort Foo will tell you whether a particular term of sort K is a Foo,
{F}:>Foo will cast F to sort Foo if F is of sort Foo and will be
undefined (i.e., theoretically defined as #Bottom, the bottom symbol in
matching logic) otherwise. Finally, bar will project out the child of a foo
named bar in its production declaration.

Note that if another term of equal or smaller sort to Foo exists and has a
child named bar of equal or smaller sort to Bar, this will generate an
ambiguity during parsing, so care should be taken to ensure that named
nonterminals are sufficiently unique from one another to prevent such
ambiguities. Of course, the compiler will generate a warning in this case.

simplification attribute

The simplification attribute identifies rules outside the main semantics that
are used to simplify function patterns.

Conditions: A simplification rule is applied by matching the function
arguments, instead of unification as when applying function definition
rules. This allows function symbols to appear nested as arguments to other
functions on the left-hand side of a simplification rule, which is forbidden in
function definition rules. For example, this rule would not be accepted as a
function definition rule:

rule (X +Int Y) +Int Z => X +Int (Y +Int Z) [simplification]

A simplification rule is only applied when the current side condition implies
the requires clause of the rule, like function definition rules.

Order: The simplification attribute accepts an optional integer argument
which is the rule's simplification priority; if the optional argument is not
specified, it is equivalent to a simplification priority of 50. Backends
should attempt simplification rules in order of their simplification
priority
, but are not required to do so; in fact, the backend is free to apply
simplification rules at any time. Because of this, users must ensure that
simplification rules are sound regardless of their order of application. This
differs from the priority attribute in that rules with the priority
attribute must be applied in their priority order by the backend. It is an
error to have the priority attribute on a simplification rule.

For example, for the following definition:

    syntax WordStack ::= Int ":" WordStack | ".WordStack"
    syntax Int ::= sizeWordStack    ( WordStack       ) [function]
                 | sizeWordStackAux ( WordStack , Int ) [function]
 // --------------------------------------------------------------
    rule sizeWordStack(WS) => sizeWordStackAux(WS, 0)

    rule sizeWordStackAux(.WordStack, N) => N
    rule sizeWordStackAux(W : WS    , N) => sizeWordStackAux(WS, N +Int 1)

We might add the following simplification lemma:

    rule sizeWordStackAux(WS, N) => N +Int sizeWordStackAux(WS, 0)
      requires N =/=Int 0
      [simplification]

Then this simplification rule will only apply if the Haskell backend can prove
that notBool N =/=Int 0 is unsatisfiable. This avoids an infinite cycle of
applying this simplification lemma.

NOTE: The frontend and Haskell backend do not check that supplied
simplification rules are sound, this is the developer's responsibility. In
particular, rules with the simplification attribute must preserve definedness;
that is, if the left-hand side refers to any partial function then:

  • the right-hand side must be #Bottom when the left-hand side is #Bottom, or
  • the rule must have an ensures clause that is false when the left-hand
    side is #Bottom, or
  • the rule must have a requires clause that is false when the left-hand
    side is #Bottom.

These conditions are in order of decreasing preference: the best option is to
preserve #Bottom on the right-hand side, the next best option is to have an
ensures clause, and the least-preferred option is to have a requires clause.
The most preferred option is to write total functions and avoid the entire issue.

NOTE: The Haskell backend does not attempt to prove claims which right-hand
side is #Bottom. The reason for this is that the general case is undecidable,
and the backend might enter an infinite loop. Therefore, the backend emits a
warning if it encounters such a claim.

concrete and symbolic attributes (Haskell backend)

Users can control the application of simplification rules using the concrete
and the symbolic attributes by specifying the type of patterns the rule's
arguments are to match.

A concrete pattern is a pattern which does not contain variables or unevaluated
functions, otherwise the pattern is symbolic.

The semantics of the two attributes is defined as follows:

  • If a simplification rule is marked concrete, then all arguments must be
    concrete for the rule to match.
  • If a simplification rule is marked symbolic, then all arguments must be
    symbolic for the rule to match.
  • The following syntax concrete(<variables>) (resp. symbolic(<variables>)),
    where <variables> is a list of variable names separated by commas, can be used
    to specify the exact arguments the user expects to match concrete (resp. symbolic)
    patterns.

For example, the following will only match when all arguments
are concrete:

rule X +Int (Y +Int Z) => (X +Int Y) +Int Z [simplification, concrete]

Conversely, the following will only match when all arguments
are symbolic:

rule X +Int (Y +Int Z) => (X +Int Y) +Int Z [simplification, symbolic]

In practice, the following rules will re-associate and commute terms to combine
concrete arguments:

rule (A +Int Y) +Int Z => A +Int (Y +Int Z)
  [concrete(Y, Z), symbolic(A), simplification]

rule X +Int (B +Int Z) => B +Int (X +Int Z)
  [concrete(X, Z), symbolic(B), simplification]

The unboundVariables attribute

Normally, K rules are not allowed to contain regular (i.e., not fresh, not
existential) variables in the RHS / requires / ensures clauses which are not
bound in the LHS.

However, in certain cases this behavior might be desired, like, for example,
when specifying a macro rule which is to be used in the LHS of other rules.
To allow for such cases, but still be useful and perform the unboundness checks
in regular cases, the unboundVariables attributes allows the user to specify
a comma-separated list of names of variables which can be unbound in the rule.

For example, in the macro declaration

  rule cppEnumType => bar(_, scopedEnum() #Or unscopedEnum() ) [unboundVariables(_)]

the declaration unboundVariables(_) allows the rule to pass the unbound
variable checks, and this in turn allows for cppEnumType to be used in
the LHS of a rule to mean the pattern above:

  rule inverseConvertType(cppEnumType, foo((cppEnumType #as T::CPPType => underlyingType(T))))

The memo attribute

The memo attribute is a hint from the user to the backend to memoize a
function. Not all backends support memoization, but when the attribute is used
and the definition is compiled for a memo-supporting backend, then calls to
the function may be cached. At the time of writing, only the Haskell
backend supports memoization.

Limitations of memoization with the Haskell backend

The Haskell backend will only cache a function call if all arguments are concrete.

It is recommended not to memoize recursive functions, as each recursive call
will be stored in the cache, but only the first iteration will be retrieved from
the cache; that is, the cache will be filled with many unreachable
entries. Instead, we recommend to perform a worker-wrapper transformation on
recursive functions, and apply the memo attribute to the wrapper.

Warning: A function declared with the memo attribute must not use
uninterpreted functions in the side-condition of any rule. Memoizing such an
impure function is unsound. To see why, consider the following rules:

syntax Bool ::= impure( Int ) [function]

syntax Int ::= unsound( Int ) [function, memo]
rule unsound(X:Int) => X +Int 1 requires impure(X)
rule unsound(X:Int) => X        requires notBool impure(X)

Because the function impure is not given rules to cover all inputs, unsound
can be memoized incoherently. For example,

{unsound(0) #And {impure(0) #Equals true}} #Equals 1

but

{unsound(0) #And {impure(0) #Equals false}} #Equals 0

The memoized value of unsound(0) would be incoherently determined by which
pattern the backend encounters first.

Variable Sort Inference

In K, it is not required that users declare the sorts of variables in rules or
in the initial configuration. If the user does not explicitly declare the sort
of a variable somewhere via a cast (see below), the sort of the variable is
inferred from context based on the sort signature of every place the variable
appears in the rule.

As an example, consider the rule for addition in IMP:

    syntax Exp ::= Exp "+" Exp | Int

    rule I1 + I2 => I1 +Int I2

Here +Int is defined in the INT module with the following signature:

    syntax Int ::= Int "+Int" Int [function]

In the rule above, the sort of both I1 and I2 is inferred as Int. This is because
a variable must have the same sort every place it appears within the same rule.
While a variable appearing only on the left-hand-side of the rule could have
sort Exp instead, the same variable appears as a child of +Int, which
constriants the sorts of I1 and I2 more tightly. Since the sort must be a
subsort of Int or equal to Int, and Int has no subsorts, we infer Int
as the sorts of I1 and I2. This means that the above rule will not match
until I1 and I2 become integers (i.e., have already been evaluated).

More complex examples are possible, however:

    syntax Exp ::= Exp "+" Int | Int
    rule _ + _ => 0

Here we have two anonymous variables. They do not refer to the same variable
as one another, so they can have different sorts. The right side is constrained
by + to be of sort Int, but the left side could be either Exp or Int.
When this occurs, we have multiple solutions to the sorts of the variables in
the rule. K will only choose solutions which are maximal, however. To be
precise, if two different solutions exist, but the sorts of one solution are
all greater than or equal to the sorts of the other solution, K will discard
the smaller solution. Thus, in the case above, the variable on the left side
of the + is inferred of sort Exp, because the solution (Exp, Int) is
strictly greater than the solution (Int, Int).

It is possible, however, for terms to have multiple maximal solutions:

    syntax Exp ::= Exp "+" Int | Int "+" Exp | Int
    rule I1 + I2 => 0

In this example, there is an ambiguous parse. This could parse as either
the first + or the second. In the first case, the maximal solution chosen is
(Exp, Int). In the second, it is (Int, Exp). Neither of these solutions is
greater than the other, so both are allowed by K. As a result, this program
will emit an error because the parse is ambiguous. To pick one solution over
the other, a cast or a prefer or avoid attribute can be used.

Casting

There are three main types of casts in K: the semantic cast, the strict cast,
and the projection cast.

Semantic casts

For every sort S declared in your grammar, K will define the following
production for you for use in rules:

    syntax S ::= S ":S"

The meaning of this cast is that the term inside the cast must be less than
or equal to Sort. This can be used to resolve ambiguities, but its principle
purpose is to guide execution by telling K what sort variables must match in
order for the rule to apply. When compiled, it will generate a pattern that
matches on an injection into Sort.

Strict casts

K also introduces the strict cast:

    syntax S ::= S "::S"

The meaning at runtime is exactly the same as the semantic cast; however, it
restricts the sort of the term inside the cast to exactly Sort. That is
to say, if you use it on something that is a strictly smaller sort, it will
generate a type error. This is useful in certain circumstances to help
disambiguate terms, when a semantic cast would not have resolved the ambiguity.
As such, it is primarily used to solve ambiguities rather than to guide
execution.

Projection casts

K also introduces the projection cast:

    syntax {S2} S ::= "{" S2 "}" ":>S"

The meaning of this cast at runtime is that if the term inside is of sort
Sort, it should have it injection stripped away and the value inside is
returned as a term of static sort Sort. However, if the term is of a
different sort, it is an error and execution will get stuck. Thus the primary
usefulness of this cast is to cast the return value of a function with a
greater sort down to a strictly smaller sort that you expect the return value
of the function to have. For example:

    syntax Exp ::= foo(Exp) [function] | bar(Int) | Int
    rule foo(I:Int) => I
    rule bar(I) => bar({foo(I +Int 1)}:>Int)

Here we know that foo(I +Int 1) will return an Int, but the return sort of
foo is Exp. So we project the result into the Int sort so that it can
be placed as the child of a bar.

owise and priority attributes.

Sometimes, it is simply not convenient to explicitly describe every
single negative case under which a rule should not apply. Instead,
we simply wish to say that a rule should only apply after some other set of
rules have been tried. K introduces two different attributes that can be
added to rules which will automatically generate the necessary matching
conditions in a manner which is performant for concrete execution (indeed,
it generally outperforms during concrete execution code where the conditions
are written explicitly).

The first is the owise attribute. Very roughly, rules without an attribute
indicating their priority apply first, followed by rules with the owise
attribute only if all the other rules have been tried and failed. For example,
consider the following function:

syntax Int ::= foo(Int) [function]
rule foo(0) => 0
rule foo(_) => 1 [owise]

Here foo(0) is defined explicitly as 0. Any other integer yields the
integer 1. In particular, the second rule above will only be tried after the
first rule has been shown not to apply.

This is because the first rule has a lower number assigned for its priority
than the second rule. In practice, each rule in your semantics is implicitly
or explicitly assigned a numerical priority. Rules are tried in increasing
order of priority, starting at zero and trying each increasing numerical value
successively.

You can specify the priority of a rule with the priority attribute. For
example, I could equivalently write the second rule above as:

rule foo(_) => 1 [priority(200)]

The number 200 is not chosen at random. In fact, when you use the owise
attribute, what you are doing is implicitly setting the priority of the rule
to 200. This has a couple of implications:

  1. Multiple rules with the owise attribute all have the same priority and thus
    can apply in any order.
  2. Rules with priority higher than 200 apply after all rules with the
    owise attribute have been tried.

There is one more rule by which priorities are assigned: a rule with no
attributes indicating its priority is assigned the priority 50. Thus,
with each priority explicitly declared, the above example looks like:

syntax Int ::= foo(Int) [function]
rule foo(0) => 0 [priority(50)]
rule foo(_) => 1 [owise]

One final note: the llvm backend reserves priorities between 50 and 150
inclusive for certain specific purposes. Because of this, explicit
priorities which are given within this region may not behave precisely as
described above. This is primarily in order that it be possible where necessary
to provide guidance to the pattern matching algorithm when it would otherwise
make bad choices about which rules to try first. You generally should not
give any rule a priority within this region unless you know exactly what the
implications are with respect to how the llvm backend orders matches.

Evaluation Strategy

strict and seqstrict attributes

The strictness attributes allow defining evaluation strategies without having
to explicitly make rules which implement them. This is done by injecting
heating and cooling rules for the subterms. For this to work, you need to
define what a result is for K, by extending the KResult sort.

For example:

syntax AExp ::= Int
              | AExp "+" AExp [strict, klabel(addExp)]

This generates two heating rules (where the hole syntaxes "[]" "+" AExp and
AExp "+" "[]" is automatically added to create an evaluation context):

rule [addExp1-heat]: <k> HOLE:AExp +  AE2:AExp => HOLE ~>  [] + AE2 ... </k> [heat]
rule [addExp2-heat]: <k>  AE1:AExp + HOLE:AExp => HOLE ~> AE1 +  [] ... </k> [heat]

And two corresponding cooling rules:

rule [addExp1-cool]: <k> HOLE:AExp ~>  [] + AE2 => HOLE +  AE2 ... </k> [cool]
rule [addExp2-cool]: <k> HOLE:AExp ~> AE1 +  [] =>  AE1 + HOLE ... </k> [cool]

Note that the rules are given labels based on the klabel of the production, which
nonterminal is the hole, and whether it's the heating or the cooling rule.

You will note that these rules can apply one after another infinitely. In
practice, the KResult sort is used to break this cycle by ensuring that only
terms that are not part of the KResult sort will be heated. The heat and
cool attributes are used to tell the compiler that these are heating and
cooling rules and should be handled in the manner just described. Nothing stops
the user from writing such heating and cooling rules directly if they wish,
although we describe other more convenient syntax for most of the advanced
cases below.

One other thing to note is that in the above sentences, HOLE is just a
variable, but it has special meaning in the context of sentences with the
heat or cool attribute. In heating or cooling rules, the variable named
HOLE is considered to be the term being heated or cooled and the compiler
will generate isKResult(HOLE) and notBool isKResult(HOLE) side conditions
appropriately to ensure that the backend does not loop infinitely. The module
BOOL will also be automatically and privately included for semantic
purposes. The syntax for parsing programs will not be affected.

In order for this functionality to work, you need to define the KResult sort.
For instance, we tell K that a term is fully evaluated once it becomes an Int
here:

syntax KResult ::= Int

Note that you can also say that a given expression is only strict only in
specific argument positions. Here we use this to define "short-circuiting"
boolean operators.

syntax KResult ::= Bool

syntax BExp ::= Bool
              | BExp "||" BExp [strict(1)]
              | BExp "&&" BExp [strict(1)]

rule <k> true  || _    => true ... </k>
rule <k> false || REST => REST ... </k>

rule <k> true  && REST => REST  ... </k>
rule <k> false && _    => false ... </k>

If you want to force a specific evaluation order of the arguments, you can use
the variant seqstrict to do so. For example, this would make the boolean
operators short-circuit in their second argument first:

syntax KResult ::= Bool

syntax BExp ::= Bool
              | BExp "||" BExp [seqstrict(2,1)]
              | BExp "&&" BExp [seqstrict(2,1)]

rule <k> _    || true  => true ... </k>
rule <k> REST || false => REST ... </k>

rule <k> REST && true  => REST  ... </k>
rule <k> _    && false => false ... </k>

This will generate rules like this in the case of _||_ (note that BE1 will
not be heated unless isKResult(BE2) is true, meaning that BE2 must be
evaluated first):

rule <k>  BE1:BExp || HOLE:BExp => HOLE ~> BE1 ||  [] ... </k> [heat]
rule <k> HOLE:BExp ||  BE2:BExp => HOLE ~>  [] || BE2 ... </k> requires isKResult(BE2) [heat]

rule <k> HOLE:BExp ~>  [] || BE2 => HOLE ||  BE2 ... </k> [cool]
rule <k> HOLE:BExp ~> BE1 ||  [] =>  BE1 || HOLE ... </k> [cool]

Context Declaration

Sometimes more advanced evaluation strategies are needed. By default, the
strict and seqstrict attributes are limited in that they cannot describe
the context in which heating or cooling should occur. When this type of
control over the evaluation strategy is required, context sentences can be
used to simplify the process of declaring heating and cooling when it would be
unnecessarily verbose to write heating and cooling rules directly.

For example, if the user wants to heat a term if it exists under a foo
constructor if the term to be heated is of sort bar, one might write the
following context (with the optional label):

context [foo]: foo(HOLE:Bar)

Once again, note that HOLE is just a variable, but one that has special
meaning to the compiler indicating the position in the context that should
be heated or cooled.

This will automatically generate the following sentences:

rule [foo-heat]: <k> foo(HOLE:Bar) => HOLE ~> foo([]) ... </k> [heat]
rule [foo-cool]: <k> HOLE:Bar ~> foo([]) => foo(HOLE) ... </k> [cool]

The user may also write the K cell explicitly in the context declaration
if they want to match on another cell as well, for example:

context <k> foo(HOLE:Bar) ... </k> <state> .Map </state>

This context will now only heat or cool if the state cell is empty.

Side conditions in context declarations

The user is allowed to write a side condition in a context declaration, like
so:

context foo(HOLE:Bar) requires baz(HOLE)

This side condition will be appended verbatim to the heating rule that is
generated, however, it will not affect the cooling rule that is generated:

rule <k> foo(HOLE:Bar) => HOLE ~> foo([]) ... </k> requires baz(HOLE) [heat]
rule <k> HOLE:Bar ~> foo([]) => foo(HOLE) ... </k> [cool]

Rewrites in context declarations

The user can also include exactly one rewrite operation in a context
declaration if that rule rewrites the variable HOLE on the left hand side
to a term containing HOLE on the right hand side. For exampl;e:

context foo(HOLE:Bar => bar(HOLE))

In this case, the code generated will be as follows:

rule <k> foo(HOLE:Bar) => bar(HOLE) ~> foo([]) ... </k> [heat]
rule <k> bar(HOLE:Bar) ~> foo([]) => foo(HOLE) ... </k> [cool]

This can be useful if the user wishes to evaluate a term using a different
set of rules than normal.

result attribute

Sometimes it is necessary to be able to evaluate a term to a different sort
than KResult. This is done by means of adding the result attribute to
a strict production, a context, or an explicit heating or cooling rule:

syntax BExp ::= Bool
              | BExp "||" BExp [seqstrict(2,1), result(Bool)]

In this case, the sort check used by seqstrict and by the heat and cool
attributes will be isBool instead of isKResult. This particular example
does not really require use of the result attribute, but if the user wishes
to evaluate a term of sort KResult further, the result attribute would be
required.

hybrid attribute

In certain situations, it is desirable to treat a particular production which
has the strict attribute as a result if the term has had its arguments fully
evaluated. This can be accomplished by means of the hybrid attribute:

syntax KResult ::= Bool

syntax BExp ::= Bool
              | BExp "||" BExp [strict(1), hybrid]

This attribute is equivalent in this case to the following additional axiom
being added to the definition of isKResult:

rule isKResult(BE1:BExp || BE2:BExp) => true requires isKResult(BE1)

Sometimes you wish to declare a production hybrid with respect to a predicate
other than isKResult. You can do this by specifying a sort as the body of the
hybrid attribute, e.g.:

syntax BExp ::= BExp "||" BExp [strict(1), hybrid(Foo)]

generates the rule:

rule isFoo(BE1:BExp || BE2:BExp) => true requires isFoo(BE1)

Properly speaking, hybrid takes an optional comma-separated list of sort
names. If the list is empty, the attribute is equivalent to hybrid(KResult).
Otherwise, it generates hybrid predicates for exactly the sorts named.

Context aliases

Sometimes it is necessary to define a fairly complicated evaluation strategy
for a lot of different operators. In this case, the user could simply write
a number of complex context declarations, however, this quickly becomes
tedious. For this purpose, K has a concept called a context alias. A context
alias is a bit like a template for describing contexts. The template can then
be instantiated against particular productions using the strict and
seqstrict attributes.

Here is a (simplified) example taken from the K semantics of C++:

context alias [c]: <k> HERE:K ... </k> <evaluate> false </evaluate>
context alias [c]: <k> HERE:K ... </k> <evaluate> true </evaluate> [result(ExecResult)]

syntax Expr ::= Expr "=" Init [strict(c; 1)]

This defines the evaluation strategy during the translation phase of a C++
program for the assignment operator. It is equivalent to writing the following
context declarations:

context <k> HOLE:Expr = I:Init ... </k> <evaluate> false </evaluate>
context <k> HOLE:Expr = I:Init ... </k> <evaluate> true </evaluate> [result(ExecResult)]

What this is saying is, if the evaluate cell is false, evaluate the term
like normal to a KResult. But if the evaluate cell is true, instead
evaluate it to the ExecResult sort.

Essentially, we have given a name to this evaluation strategy in the form of
the rule label on the context alias sentences (in this case, c). We can
then say that we want to use this evaluation strategy to evaluate particular
arguments of particular productions by referring to it by name in a strict
attribute. For example, strict(c) will instantiate these contexts once for
each argument of the production, whereas strict(c; 1) will instantiate it
only for the first argument. The special variable HERE is used to tell the
compiler where you want to place the production that is to be heated or cooled.

You can also specify multiple context aliases for different parts of a production,
for example:

syntax Exp ::= foo(Exp, Exp) [strict(left; 1; right; 2)]

This says that we can evaluate the left and right arguments in either order, but to evaluate
the left using the left context alias and the right using the right context alias.

We can also say seqstrict(left; 1; right; 2), in which case we additionally must evaluate
the left argument before the right argument. Note, all strict positions are considered collectively
when determining the evaluation order of seqstrict or the hybrid predicates.

A strict attribute with no rule label associated with it is equivalent to
a strict attribute given with the following context alias:

context alias [default]: <k> HERE:K ... </k>

One syntactic convenience that is provided is that if you wish to declare the following context:

context foo(HOLE => bar(HOLE))

you can simply write the following:

syntax Foo ::= foo(Bar) [strict(alias)]

context alias [alias]: HERE [context(bar)]

Pattern Matching

As Patterns

New syntax has been added to K for matching a pattern and binding the resulting
match in its entirety to a variable.

The syntax is:

Pattern #as V::Var

In this case, Pattern, including any variables, is matched and the resulting
variables are added to the substitution if matching succeeds. Furthermore, the
term matched by Pattern is added to the substitution as V.

This code can also be used outside of any rewrite, in which case matching
occurs as if it appeared on the left hand side, and the right hand side becomes
a variable corresponding to the alias.

It is an error to use an as pattern on the right hand side of a rule.

Record-like KApply Patterns

We have added a syntax for matching on KApply terms which mimics the record
syntax in functional languages. This allows us to more easily express patterns
involving a KApply term in which we don't care about some or most of the
children, without introducing a dependency into the code on the number of
arguments which could be changed by a future refactoring.

The syntax is:

record(... field1: Pattern1, field2: Pattern2)

Note that this only applies to productions that are prefix productions.
A prefix production is considered by the implementation to be any production
whose production items match the following regular expression:

(Terminal(_)*) Terminal("(")
(NonTerminal (Terminal(",") NonTerminal)* )?
Terminal(")")

In other words, any sequence of terminals followed by an open parenthesis, an
optional comma separated list of non-terminals, and a close parenthesis.

If a prefix production has no named nonterminals, a record(...) syntax is
allowed, but in order to reference specific fields, it is necessary to give one
or more of the non-terminals in the production names.

Note: because the implementation currently creates one production per possible
set of fields to match on, and because all possible permutations of all
possible subsets of a list of n elements is a number that scales factorially
and reaches over 100 thousand productions at n=8, we currently do not allow
fields to be matched in any order like a true record, but only in the same
order as appears in the production itself.

Given that this only reduces the number of productions to the size of the power
set, this will still explode the parsing time if we create large productions of
10 or more fields that all have names. This is something that should probably
be improved, however, productions with that large of an arity are rare, and
thus it has not been viewed as a priority.

Or Patterns

Sometimes you wish to express that a rule should match if one out of multiple
patterns should match the same subterm. We can now express this in K by means
of using the #Or ML connective on the left hand side of a rule.

For example:

rule foo #Or bar #Or baz => qux

Here any of foo, bar, or baz will match this rule. Note that the behavior is
ill-defined if it is not the case that all the clauses of the or have the same
bound variables.

Matching global context in function rules

On occasion it is highly desirable to be able to look up information from the
global configuration and match against it when evaluating a function. For this
purpose, we introduce a new syntax for function rules.

This syntax allows the user to match on function context from within a
function rule:

syntax Int ::= foo(Int) [function]

rule [[ foo(0) => I ]]
     <bar> I </bar>

rule something => foo(0)

This is completely desugared by the K frontend and does not require any special
support in the backend. It is an error to have a rewrite inside function
context, as we do not currently support propagating such changes back into the
global configuration. It is also an error if the context is not at the top
level of a rule body.

Desugared code:

syntax Int ::= foo(Int, GeneratedTopCell) [function]

rule foo(0, <generatedTop>
              <bar> I </bar>
              ...
            </generatedTop> #as Configuration) => I
rule <generatedTop>
       <k> something ... </k>
       ...
     </generatedTop> #as Configuration
  => <generatedTop>
       <k> foo(0, Configuration> ... </k>
       ...
     </generatedTop>

Collection patterns

It is allowed to write patterns on the left hand side of rules which refer to
complex terms of sort Map, List, and Set, despite these patterns ostensibly
breaking the rule that terms which are functions should not appear on the left
hand side of rules. Such terms are destructured into pattern matching
operations.

The following forms are allowed:

// 0 or more elements followed by 0 or 1 variables of sort List followed by
// 0 or more elements
ListItem(E1) ListItem(E2) L:List ListItem(E3) ListItem(E4)

// the empty list
.List

// 1 or more list update operations applied to a variable
L:List [ K1 <- E1 ] [ K2 <- E2 ]

// 0 or more elements in any order plus 0 or 1 variables of sort Set
// in any order
SetItem(K1) SetItem(K2) S::Set SetItem(K3) SetItem(K4)

// the empty set
.Set

// 0 or more elements in any order plus by 0 or 1 variables of sort Map
// in any order
K1 |-> E1 K2 |-> E2 M::Map K3 |-> E3 K4 |-> E4

// the empty map
.Map

Here K1, K2, K3, K4 etc can be any pattern except a pattern containing both
function symbols and unbound variables. An unbound variable is a variable whose
binding cannot be determined by means of decomposing non-set-or-map patterns or
map elements whose keys contain no unbound variables.

This is determined recursively, ie, the term K1 |-> E2 E2 |-> E3 E3 |-> E4 is
considered to contain no unbound variables.

Note that in the pattern K1 |-> E2 K3 |-> E4 E4 |-> E5, K1 and K3 are
unbound, but E4 is bound because it is bound by deconstructing the key E3, even
though E3 is itself unbound.

In the above examples, E1, E2, E3, and E4 can be any pattern that is normally
allowed on the lhs of a rule.

When a map, set, or list key contains function symbols, we know that the
variables in that key are bound (because of the above restriction), so it is
possible to evaluate the function to a concrete term prior to performing the
lookup.

Indeed, this is the precise semantics which occurs; the function is evaluated
and the result is looked up in the collection.

For example:

syntax Int ::= f(Int) [function]
rule f(I:Int) => I +Int 1
rule <k> I:Int => . ... </k> <state> ... SetItem(f(I)) ... </state>

This will rewrite I to . if and only if the state cell contains
I +Int 1.

Note that in the case of Set and Map, one guarantee is that K1, K2, K3, and K4
represent /distinct/ elements. Pattern matching fails if the correct number of
distinct elements cannot be found.

Matching on cell fragments

K allows matching fragments of the configuration and using them to construct
terms and use as function parameters.

configuration <t>
                <k> #init ~> #collectOdd ~> $PGM </k>
                <fs>
                  <f multiplicity="*" type="Set"> 1 </f>
                </fs>
              </t>

The #collectOdd construct grabs the entire content of the <fs> cell.
We may also match on only a portion of its content. Note that the fragment
must be wrapped in a <f> cell at the call site.

syntax KItem ::= "#collectOdd"
rule <k> #collectOdd => collectOdd(<fs> Fs </fs>) ... </k>
     <fs> Fs </fs>

The collectOdd function collects the items it needs

syntax Set ::= collectOdd(FsCell) [function]
rule collectOdd(<fs> <f> I </f> REST </fs>) => SetItem(I) collectOdd(<fs> REST </fs>) requires I %Int 2 ==Int 1
rule collectOdd(<fs> <f> I </f> REST </fs>) =>            collectOdd(<fs> REST </fs>) requires I %Int 2 ==Int 0
rule collectOdd(<fs> .Bag </fs>) => .Set

all-path and one-path attributes to distinguish reachability claims

As the Haskell backend can handle both one-path and all-path reachability
claims, but both these are encoded as rewrite rules in K, these attributes can
be used to clarify what kind of claim a rule is.

In addition of being able to annotate a rule with one of them
(if annotating with more at the same time, only one of them would be chosen),
one can also annotate whole modules, to give a default claim type for all rules
in that module.

Additionally, the Haskell backend introduces an extra command line option
for the K frontend, --default-claim-type, with possible values
all-path and one-path to allow choosing a default type for all
claims.

Set Variables

Motivation

Set variables were introduced as part of Matching Mu Logic, the mathematical
foundations for K. In Matching Mu Logic, terms evaluate to sets of values.
This is useful for both capturing partiality (as in 3/0) and capturing
non-determinism (as in 3 #Or 5). Consequently, symbol interpretation is
extended to have a collective interpretation over sets of input values.

Usually, K rules are given using regular variables, which expect that the term
they match is both defined and has a unique interpretation.

However, it is sometimes useful to have simplification rules which work over
any kind of pattern, be it undefined or non-deterministic. This behavior can be
achieved by using set variables to stand for any kind of pattern.

Syntax

Any variable prefixed by @ will be considered a set variable.

Example

Below is a simplification rule which motivated this extension:

  rule #Ceil(@I1:Int /Int @I2:Int) =>
    {(@I2 =/=Int 0) #Equals true} #And #Ceil(@I1) #And #Ceil(@I2)
    [anywhere]

This rule basically says that @I1:Int /Int @I2:Int is defined if @I1 and
@I2 are defined and @I2 is not 0. Using sets variables here is important as
it allows the simplification rule to apply any symbolic patterns, without
caring whether they are defined or not.

This allows simplifying the expression #Ceil((A:Int /Int B:Int) / C:Int) to:

{(C =/=Int 0) #Equals true} #And #Ceil(C) #And ({(B =/=Int 0) #Equals true}
#And #Ceil(B) #And #Ceil(A)`

See kframework/kore#729 for
more details.

SMT Translation

K makes queries to an SMT solver (Z3) to discharge proof obligations when doing
symbolic execution. You can control how these queries are made using the
attributes smtlib, smt-hook, and smt-lemma on declared productions.
These attributes guide the prover when it tries to apply rules to discharge a
proof obligation.

  • smt-hook(...) allows you to specify a term in SMTLIB2 format which should
    be used to encode that production, and assumes that all symbols appearing in
    the term are already declared by the SMT solver.
  • smtlib(...) allows you to declare a new SMT symbol to be used when that
    production is sent to Z3, and gives it uninterpreted function semantics.
  • smt-lemma can be applied to a rule to encode it as a conditional equality
    when sending queries to Z3. A rule rule LHS => RHS requires REQ will be
    encoded as the conditional equality (=> REQ (= (LHS RHS)). Every symbol
    present in the rule must have an smt-hook(...) or smtlib(...) attribute.
syntax Int ::= "~Int" Int          [function, klabel(~Int_), symbol,
                                    smtlib(notInt)]
             | Int "^%Int" Int Int [function, klabel(_^%Int__), symbol,
                                    smt-hook((mod (^ #1 #2) #3))]

In the example above, we declare two productions ~Int_ and _^%Int__, and
tell the SMT solver to:

  • use uninterpreted function semantics for ~Int_ via SMTLIB2 symbol
    notInt, and
  • use the SMTLIB2 term (mod (^ #1 #2) #3) (where #N marks the Nth
    production non-terminal argument positions) for _^%Int__, where mod and
    ^ already are declared by the SMT solver.

Caution

Set variables are currently only supported by the Haskell backend.
The use of rules with set variables should be sound for all other backends
which just execute by rewriting, however it might not be safe for backends
which want to guarantee coverage.

Variables occurring only in the RHS of a rule

This section presents possible scenarios requiring variables to only appear in
the RHS of a rule.

Summary

Except for ? variables and ! (fresh) variables, which are
required to only appear in the RHS of a rule, all other variables must
also appear in the LHS of a rule. This restriction also applies to anonymous
variables; in particular, for claims, ?_ (not _) should be used in the RHS
to indicate that something changes but we don't care to what value.

To support specifying random-like behavior, the above restriction can be relaxed
by annotating a rule with the unboundVariables attribute whenever the rule
intentionally contains regular variables only occurring in the RHS.

Introduction

K uses question mark variables of the form ?X to refer to
existential variables, and uses ensures to specify logical constraints on
those variables.
These variables are only allowed to appear in the RHS of a K rule.

If the rules represent rewrite (semantic) steps or verification claims,
then the ? variables are existentially quantified at the top of the RHS;
otherwise, if they represent equations, the ? variables are quantified at the
top of the entire rule.

Note that when both ?-variables and regular variables are present,
regular variables are (implicitly) universally quantified on top of the rule
(already containing the existential quantifications).
This essentially makes all ? variables depend on all regular variables.

All examples below are intended more for program verification /
symbolic execution, and thus concrete implementations might choose to ignore
them altogether or to provide ad-hoc implementations for them.

Example: Verification claims

Consider the following definition of a (transition) system:

module A
  rule foo => true
  rule bar => true
  rule bar => false
endmodule

Consider also, the following specification of claims about the definition above:

module A-SPEC
  rule [s1]: foo => ?X:Bool
  rule [s2]: foo =>  X:Bool  [unboundVariables(X)]
  rule [s3]: bar => ?X:Bool
  rule [s4]: bar =>  X:Bool  [unboundVariables(X)]
endmodule
One-path interpretation
  • (s1) says that there exists a path from foo to some boolean, which is
    satisfied easily using the foo => true rule
  • (s3) says the same thing about bar and can be satisfied by either of
    bar => true and bar => false rules
  • (s2) and (s4) can be better understood by replacing them with instances for
    each element of type Bool, which can be interpreted that
    both true and false are reachable from foo for (s2), or bar for (s4),
    respectively.
    • (s2) cannot be verified as we cannot find a path from foo to false.
    • (s4) can be verified by using bar => true to show true is reachable and
      bar => false to achieve the same thing for false
All-path interpretation
  • (s1) says that all paths from foo will reach some boolean, which is
    satisfied by the foo => true rule and the lack of other rules for foo

  • (s3) says the same thing about bar and can be satisfied by checking that
    both bar => true and bar => false end in a boolean, and there are no
    other rules for bar

  • (s2) and (s4) can be better understood by replacing them with instances for
    each element of type Bool, which can be interpreted that
    both true and false are reachable in all paths originating in
    foo for (s2), or bar for (s4), respectively.
    This is a very strong claim, requiring that all paths originating in
    foo (bar) pass through both true and false,
    so neither (s2) nor (s4) can be verified.

    Interestingly enough, adding a rule like false => true would make both
    (s2) and (s4) hold.

Example: Random Number Construct rand()

The random number construct rand() is a language construct which could be
easily conceived to be part of the syntax of a programming language:

Exp ::= "rand" "(" ")"

The intended semantics of rand() is that it can rewrite to any integer in
a single step. This could be expressed as the following following infinitely
many rules.

rule  rand() => 0
rule  rand() => 1
rule  rand() => 2
  ...    ...
rule rand() => (-1)
rule rand() => (-2)
  ...    ...

Since we need an instance of the rule for every integer, one could summarize
the above infinitely many rules with the rule

rule rand() => I:Int [unboundVariables(I)]

Note that I occurs only in the RHS in the rule above, and thus the rule
needs the unboundVariables(I) attribute to signal that this is intentionally.

One can define variants of rand() by further constraining the output variable
as a precondition to the rule.

Rand-like examples
  1. randBounded(M,N) can rewrite to any integer between M and N

    syntax Exp ::= randBounded(Int, Int)
    rule randBounded(M, N) => I
      requires M <=Int I andBool I <=Int N
      [unboundVariables(I)]
    
  2. randInList(Is) takes a list Is of items
    and can rewrite in one step to any item in Is.

    syntax Exp ::= randInList (List)
    rule randInList(Is) => I
      requires I inList Is
      [unboundVariables(I)]
    
  3. randNotInList(Is) takes a list Is of items
    and can rewrite in one step to any item not in Is.

    syntax Exp ::= randNotInList (List)
    rule randNotInList(Is) => I
      requires notBool(I inList Is)
      [unboundVariables(I)]
    
  4. randPrime(), can rewrite to any prime number.

    syntax Exp ::= randPrime ()
    rule randPrime() => X:Int
      requires isPrime(X)
      [unboundVariables(X)]
    

    where isPrime(_) is a predicate that can be defined in the usual way.

Note 1: all above are not function symbols, but language constructs.

Note 2: Currently the frontend does not allow rules with universally quantified
variables in the RHS which are not bound in the LHS.

Note 3. Allowing these rules in a concrete execution engine would require an
algorithm for generating concrete instances for such variables, satisfying the
given constraints; thus the unboundVariables attribute serves two purposes:

  • to allow such rules to pass the variable checks, and
  • to signal (concrete execution) backends that specialized algorithm would be
    needed to instantiate these variables.

Example: Fresh Integer Construct fresh(Is)

The fresh integer construct fresh(Is) is a language construct.

Exp ::= ... | "fresh" "(" List{Int} ")"

The intended semantics of fresh(Is) is that it can always rewrite to an
integer that in not in Is.

Note that fresh(Is) and randNotInList(Is) are different; the former
does not need to be able to rewrite to every integers not in Is,
while the latter requires so.

For example, it is correct to implement fresh(Is) so it always returns the
smallest positive integer that is not in Is, but same implementation for
randNotInList(Is) might be considered inadequate.
In other words, there exist multiple correct implementations of fresh(Is),
some of which may be deterministic, but there only exists a unique
implementation of randNotInList(Is).
Finally, note that randNotInList(Is) is a correct implementation
for fresh(Is); Hence, concrete execution engines can choose to handle
such rules accordingly.

We use the following K syntax to define fresh(Is)

syntax Exp ::= fresh (List{Int})
rule fresh(Is:List{Int}) => ?I:Int
  ensures notBool (?I inList{Int} Is)

A variant of this would be a choiceInList(Is) language construct which would
choose some number from a list:

syntax Exp ::= choiceInList (List{Int})
rule choiceInList(Is:List{Int}) => ?I:Int
  ensures ?I inList{Int} Is

Note: This definition is different from one using a ! variable to indicate
freshness because using ! is just syntactic sugar for generating globally
unique instances and relies on a special configuration cell, and cannot be
constrained, while the fresh described here is local and can be constrained.
While the first is more appropriate for concrete execution, this might be
better for symbolic execution / program verification.

Example: Arbitrary Number (Unspecific Function) arb()

The function arb() is not a PL construct, but a mathematical function.
Therefore, its definition should not be interpreted as an execution step, but
rather as an equality.

The intended semantics of arb() is that it is an unspecified nullary function.
The exact return value of arb() is unspecified in the semantics but up to the
implementations.
However, being a mathematical function, arb() must return the same value in
any one implementation.

We do not need special frontend syntax to define arb().
We only need to define it in the usual way as a function
(instead of a language construct), and provide no axioms for it.
The total attribute ensures that the function is total, i.e.,
that it evaluates to precisely one value for each input.

Variants

There are many variants of arb(). For example, arbInList(Is) is
an unspecified function whose return value must be an element from Is.

Note that arbInList(Is) is different from choiceInList(Is), because
choiceInList(Is) transitions to an integer in Is (could be a different one
each time it is used), while arbInList(Is) is equal to a (fixed)
integer not in Is.

W.r.t. the arb variants, we can use ? variables and the function
annotation to signal that we're defining a function and the value of the
function is fixed, but non-determinate.

syntax Int ::= arbInList(List{Int}) [function]
rule arbInList(Is:List{Int}) => ?I:Int
  ensures ?I inList{Int} Is

If elimination of existentials in equational rules is needed, one possible
approach would be through Skolemization,
i.e., replacing the ? variable with a new uninterpreted function depending
on the regular variables present in the function.

Example: Interval (Non-function Symbols) interval()

The symbol interval(M,N) is not a PL construct, nor a function in the
first-order sense, but a proper matching-logic symbol, whose interpretation is
in the powerset of its domain.
Its axioms will not use rewrites but equalities.

The intended semantics of interval(M,N) is that it equals the set of
integers that are larger than or equal to M and smaller than or equal to N.

Since expressing the axiom for interval requires an an existential
quantification on the right-hand-side, thus making it a non-total symbol
defined through an equation, using ? variables might be confusing since their
usage would be different from that presented in the previous sections.

Hence, the proposal to support this would be to write this as a proper ML rule.
A possible syntax for this purpose would be:

eq  interval(M,N)
    ==
    #Exists X:Int .
        (X:Int #And { X >=Int M #Equals true } #And { X <=Int N #Equals true })

Additionally, the symbol declaration would require a special attribute to
signal the fact that it is not a constructor but a defined symbol.

Since this feature is not clearly needed by K users at the moment, it is only
presented here as an example; its implementation will be postponed for such time
when its usefulness becomes apparent.

Parser Generation

In addition to on-the-fly parser generation using kast, K is capable of
ahead-of-time parser generation of LR(1) or GLR parsers using Flex and Bison.
This can be done one of two different ways.

  1. You can explicitly request for a particular parser to be generated by
    invoking kast --gen-parser <outputFile> or
    kast --gen-glr-parser <outputFile> respectively. kast will then create a
    parser based on the same command line flags that govern on-the-fly parsing,
    like -s to specify the starting sort, and -m to specify the module to
    parse under. By default, this generates a parser for the sort of the $PGM
    configuration variable in the main syntax module of the definition.
  2. You can request that a specific set of parsers be generated for all the
    configuration variables of your definition by passing the
    --gen-bison-parser or --gen-glr-bison-parser flags to kompile.
    kompile will decide the sorts to use as start symbols based on the sorts
    in the configuration declaration for the configuration variables. The $PGM
    configuration variable will be generated based on the main syntax module
    of the definition. The user must explicitly annotate the configuration
    declaration with the other modules to use to parse the other configuration
    variables as attributes. For example, if I have the following cell in the
    configuration declaration: <cell> foo($FOO:Foo, $BAR:Bar) </cell>,
    One might annotate it with the attribute pair parser="FOO, TEST; BAR, TEST2"
    to indicate that configuration variable $FOO should be parsed in the
    TEST module, and configuration variable $BAR should be parsed in the
    TEST2 module. If the user forgets to annotate the declaration with the
    parser attribute, only the $PGM parser will be generated.

Bison-generated parsers are extremely fast compared to kast, but they have
some important limitations:

  • Bison parsers will always output Kore. You can then pass the resulting AST
    directly to llvm-krun or kore-exec and bypass the krun frontend, making
    them very fast, but lower-level.
  • Bison parsers do not yet support macros. This may change in a future release.
    Note that you can use anywhere rules instead of macros in most cases to get
    around this limitation, although they will not benefit from unparsing via the
    alias attribute.
  • Obligation falls on the user to ensure that the grammar they write is LR(1)
    if they choose to use LR(1) parsing. If this does not happen, the parser
    generated will have shift/reduce or reduce/reduce conflicts and the parser
    may behave differently than kast would (kast is a GLL parser, ie, it
    is based on LL parsers and parses all unambiguous context-free grammars).
    K provides an attribute, not-lr1, which can be applied to modules known to
    not be LR(1), and will trigger a warning if the user attempts to generate an
    LR(1) parser which recursively imports that module.
  • If you are using LR(1) based parsing, the prefer and avoid attributes are
    ignored. It is only possible to implement these attributes by means of
    generalized LL or LR parsing and a postprocessing on the AST to remove the
    undesirable ambiguity.
  • Obligation falls on the user to ensure that the grammar they write has as
    few conflicts as possible if they are using GLR parsing. Bison's GLR support
    is quite primitive, and in the worst case it can use exponential space and
    time to parse a program, which generally leads the generated parser to report
    "memory exhausted", indicating that the parse could not be completed within
    the stack space allocated by Bison. It's best to ensure that the grammar is
    as close to LR(1) as possible and only utilizes conflicts where absolutely
    necessary. One tool that can be used to facilitate this is to pass
    --bison-lists to kompile. This will disable support for the List{Sort}
    syntax production, and it will make NeList{Sort} left associative, but the
    resulting productions generated for NeList{Sort} will be LR(1) and use bounded
    stack space.
  • If the grammar you are parsing is context-sensitive (for example, because
    it requires a symbol table to parse), one thing you can do to make this
    language parse in K is to implement the language as an ambiguous grammar.
    Bison's GLR parser will generate an amb production that is parametric in
    the sort of the ambiguity. You can then import the K-AMBIGUITIES module
    and use rewriting to resolve the ambiguities using whatever preprocessing
    mechanisms you prefer.

Location Information

K is able to insert file, line, and column metadata into the parse tree on a
per-sort basis when parsing using a bison-generated parser. To enable this,
mark the sort with the locations attribute.

  syntax Exp [locations]
  syntax Exp ::= Exp "/" Exp | Int

K implicitly wraps productions of these sorts in a #location term (see the
K-LOCATIONS module in kast.md). The metadata can thus be accessed with
ordinary rewrite rules:

  rule #location(_ / 0, File, StartLine, _StartColumn, _EndLine, _EndColumn) =>
  "Error: Division by zero at " +String File +String ":" Int2String(StartLine)

Sometimes it is desirable to allow code to be written in a file which
overwrites the current location information provided by the parser. This can be
done via a combination of the #LineMarker sort and the --bison-file flag to
the parser generator. If you declare a production of sort #LineMarker which
contains a regular expression terminal, this will be treated as a
line marker by the bison parser. The user will then be expected to provide
an implementation of the parser for the line marker in C. The function expected
by the parser has the signature void line_marker(char *, yyscan_t), where
yyscan_t is a
reentrant flex scanner.
The string value of the line marker token as specified by your regular
expression can be found in the first parameter of the function, and you can
set the line number used by the scanner using yyset_lineno(int, yyscan_t). If
you declare the variable extern char *filename, you can also set the current
file name by writing a malloc'd, zero-terminated string to that variable.

Unparsing

A number of factors go into how terms are unparsed in K. Here we describe some
of the features the user can use to control how unparsing happens.

Brackets

One of the phases that the unparser goes through is to insert productions
tagged with the bracket attribute where it believes this is necessary
in order to create a correct string that will be parsed back into the original
AST. The most common case of this is in expression grammars. For example,
consider the following grammar:

syntax Exp ::= Int
             | Exp "*" Exp
             > Exp "+" Exp

Here we have declared that expressions can contain integer addition and
multiplication, and that multiplication binds tighter than addition. As a
result, when writing a program, if we want to write an expression that first
applies addition, then multiplication, we must use brackets: (1 + 2) * 3.
Similarly, if we have such an AST, we must insert brackets into the AST
in order to faithfully unparse the term in a manner that will be parsed back
into the same ast, because if we do not, we end up unparsing the term as
1 + 2 * 3, which will be parsed back as 1 + (2 * 3) because of the priority
declaration in the grammar.

You can control how the unparser will insert such brackets by adding a
production with the bracket attribute and the correct sort. For example, if,
instead of parentheses, you want to use curly braces, you could write:

syntax Exp ::= "{" Exp "}" [bracket]

This would signal to the unparser how brackets should look for terms of sort
Exp, and it will use this syntax when unparsing terms of sort Exp.

Commutative collections

One thing that K will do (unless you pass the --no-sort-collections flag to
krun) is to sort associative, commutative collections (such as Set and Map)
alphanumerically. For example, if I have a collection whose keys are sort Id
and they have the values a, b, c, and d, then unparsing will always print
first the key a, then b, then c, then d, because this is the alphabetic order
of these keys when unparsed.

Furthermore, K will sort numeric keys numerically. For example, if I have a
collection whose keys are 1, 2, 5, 10, 30, it will first display 1, then 2,
then 5, then 10, then 30, because it will sort these keys numerically. Note
that this is different than an alphabetic sort, which would sort them as
1, 10, 2, 30, 5. We believe the former is more intuitive to users.

Substitution filtering

K will remove substitution terms corresponding to anonymous variables when
using the --pattern flag if those anonymous variables provide no information
about the named variables in your serach pattern. You can disable this behavior
by passing --no-substitution-filtering to krun. When this flag is not passed,
and you are using the Haskell backend, any equality in a substitution (ie, an
#Equals under an #And under an #Or), will be hidden from the user if the
left hand side is a variable that was anonymous in the --pattern passed by
the user, unless that variable appears elsewhere in the substitution. If you
want to see that variable in the substitution, you can either disable this
filtering, or give that variable a name in the original search pattern.

Variable alpha renaming

K will automatically rename variables that appear in the output configuration.
Similar to commutative collections, this is done to normalize the resulting
configuration so that equivalent configurations will be printed identically
regardless of how they happen to be reached. This pass can be disabled by
passing --no-alpha-renaming to krun.

Macro expansion

K will apply macros in reverse on the output configuration if the macro was
created with the alias or alias-rec attribute. See the section on macro
expansion for more details.

Formatting

format attribute

K allows you to control how terms are unparsed using the format attribute.
By default, a domain value is unparsed by printing its string value verbatim,
and an application pattern is unparsed by printing its terminals and children
in the sequence implied by its concrete syntax, separated by spaces. However,
K gives you complete control over how you want to unparse the symbol.

A format attribute is a string containing zero or more escape sequences that
tell K how to unparse the symbol. Escape sequences begin with a '%' and are
followed by either an integer, or a single non-digit character. Below is a
list of escape sequences recognized by the formatter:

Escape Sequence Meaning
n Insert '\n' followed by the current indentation level
i Increase the current indentation level by 1
d Decrease the current indentation level by 1
c Move to the next color in the list of colors for this production
r Reset color to the default foreground color for the terminal (See below for more information on how colors work)
an integer Print a terminal or nonterminal from the production (See below for more information)
any other char Print that character verbatim

Using the integer escape sequence

In the integer escape sequence %a, the integer a is treated as a 1-based
index into the terminals and nonterminals of the production.

  • If the offset refers to a terminal, move to the next color in the list of
    colors for this production, print the value of that terminal, then reset the
    color to the default foreground color for the terminal.

  • If the offset refers to a regular expression terminal, it is an error.

  • If the offset refers to a nonterminal, print the unparsed representation of
    the corresponding child of the current term.

color and colors attributes

K allows you to take advantage of ANSI terminal codes for foreground color
in order to colorize output pretty-printed by the unparser. This is controlled
via the color and colors attributes of productions. These attributes
combine with the format attribute to control how a term is colorized.

The first thing to understand about how colorization works is that the color
and colors attributes are used to construct a list of colors associated
with each production, and the format attribute then uses that list to choose
the color for each part of the production. For more information on how the
format attribute chooses a color from the list, see above, but essentially,
each terminal or %c in the format attribute advances the pointer in the list
by one element, and terminals and %r reset the current color to the default
foreground color of the terminal afterwards.

There are two ways you can construct a list of colors associated with a
production:

  • The color attribute creates the entire list all with the same color, as
    specified by the value of the attribute. When combined with the default format
    attribute, this will color all the terminals in that production that color, but
    more advanced techniques can be used as well.

  • The colors attribute creates the list from a manual, comma-separated list
    of colors. The attribute is invalid if the length of the list is not equal to
    the number of terminals in the production plus the number of %c substrings in
    the format attribute.

Attributes Reference

Attribute Syntax Overview

In K, many different syntactic categories accept an optional trailing list of
keywords known as attributes. Attribute lists have two different syntaxes,
depending on where they occur. Each attribute also has a type which describes
where it may occur.

The first syntax is a square-bracketed ([]) list of words. This syntax is
available for following attribute types:

  1. module attributes - may appear immediately after the module keyword
  2. sort attributes - may appear immediately after a sort declaration
  3. production attributes - may appear immediately after a BNF production
    alternative
  4. rule attributes - may appear immediately after a rule
  5. context attributes - may appear immediately after a context or context
    alias
  6. context alias attributes - may appear immediately after a context alias
  7. claim attributes - may appear immediately after a claim

The second syntax is the XML attribute syntax, i.e., a space delemited list of
key-and-quoted-value pairs appearing inside the start tag of an XML element:
<element key1="value" key2="value2" ... > </element>. This syntax is
available for the following attribute types:

  1. cell attributes - may appear inside of the cell start tag in
    configuration declarations

Unrecognized attributes are reported as an error. When we talk about
the type of an attribute, we mean a syntactic category to which an attribute
can be attached where the attribute has some semantic effect.

Attribute Index

We now provide an index of available attributes organized alphabetically with a
brief description of each. Note that the same attribute may appear in the index
multiple times to indicate its effect in different contexts or with/without
arguments. A legend describing how to interpret the index follows.

Name Type Backend Reference
alias-rec prod all Macros and Aliases
alias prod all Macros and Aliases
all-path claim haskell all-path and one-path attributes to distinguish reachability claims
anywhere rule all anywhere rules
applyPriority(_) prod all Symbol priority and associativity
avoid prod all Symbol priority and associativity
binder prod all No reference yet.
bracket prod all Parametric productions and bracket attributes
color(_) prod all color and colors attributes
colors(_) prod all color and colors attributes
concrete mod llvm symbolic and concrete attribute
concrete(_) rule haskell concrete and symbolic attributes (Haskell backend)
concrete rule haskell concrete and symbolic attributes (Haskell backend)
context(_) alias all Context aliases
deprecated prod all deprecated attribute
exit = "" cell all exit attribute
format prod all format attribute
freshGenerator prod all freshGenerator attribute
function prod all function and total attributes
group(_) all all Symbol priority and associativity
hook(_) prod all No reference yet
hybrid(_) prod all hybrid attribute
hybrid prod all hybrid attribute
klabel(_) prod all klabel(_) and symbol attributes
left prod all Symbol priority and associativity
locations sort all Location Information
macro-rec prod all Macros and Aliases
macro prod all Macros and Aliases
memo rule haskell The memo attribute
multiplicity = "_" cell all Collection Cells: multiplicity and type attributes
non-assoc prod all Symbol priority and associativity
one-path claim haskell all-path and one-path attributes to distinguish reachability claims
overload(_) prod all overload(_) attribute
owise rule all owise and priority attributes
prec(_) token all prec attribute
prefer prod all Symbol priority and associativity
priority(_) rule all owise and priority attributes
private mod all private attribute
private prod all public and private attribute
public mod all No reference yet.
public prod all public and private attribute
result(_) ctxt all result attribute
result(_) rule all result attribute
right prod all Symbol priority and associativity
seqstrict(_) prod all strict and seqstrict attributes
seqstrict prod all strict and seqstrict attributes
simplification rule haskell simplification attribute (Haskell backend)
simplification(_) rule haskell simplification attribute (Haskell backend)
smt-hook(_) prod haskell SMT Translation
smtlib(_) prod haskell SMT Translation
smt-lemma rule haskell SMT Translation
strict prod all strict and seqstrict attributes
strict(_) prod all strict and seqstrict attributes
symbolic mod haskell symbolic and concrete attribute
symbolic rule haskell concrete and symbolic attributes (Haskell backend)
symbolic(_) rule haskell concrete and symbolic attributes (Haskell backend)
symbol prod all klabel(_) and symbol attributes
terminator-symbol(_) prod all klabel(_) and symbol attributes
token prod all token attribute
token sort all token attribute
total prod all function and total attributes
trusted claim haskell trusted attribute
type = "_" cell all Collection Cells: multiplicity and type attributes
unboundVariables(_) rule all The unboundVariables attribute
unused prod all unused attribute
concrete mod all Specify that this module should only be included in concrete backends (LLVM backend).
symbolic mod all Specify that this module should only be included in symbolic backends (Haskell backend).
stream = "_" cell all Specify that this cell should be hooked up to a stream, either stdin, stdout, or stderr.

Internal Attribute Index

Some attributes should not generally appear in user code, except in some
unusual or complex examples. Such attributes are typically generated by the
compiler and used internally. We list these attributes below as a reference for
interested readers:

Name Type Backend Reference
assoc prod all assoc, comm, idem and unit attributes
comm prod all assoc, comm, idem and unit attributes
digest mod all Contains the hash of the textual contents of the module.
idem prod all assoc, comm, idem and unit attributes
unit prod all assoc, comm, idem and unit attributes
userList prod all Identifies the desugared form of Lst ::= List{Elm,"delim"}
predicate prod all Specifies the sort of a predicate label
element prod all Specifies the label of the elements in a list
bracketLabel prod all Keep track of the label of a bracket production since it can't have a klabel
injective prod all Label a given production as injective (unique output for each input)
internal prod all Production is reserved for internal use by the compiler
cool rule all strict and seqstrict attributes
heat rule all strict and seqstrict attributes

Index Legend

  • Name - the attribute's name (optionally followed by an underscore _ to indicate the attribute takes arguments)

  • Type - the syntactic categories where this attribute is not ignored;
    the possible values are the types mentioned above or shorthands:

    1. all - short for any type except cell
    2. mod - short for module
    3. sort
    4. prod - short for production
    5. rule
    6. ctxt - short for context or context alias
    7. claim
    8. cell
  • Backend - the backends that do not ignore this attribute; possible values:

    1. all - all backends
    2. llvm - the LLVM backend
    3. haskell - the Haskell backend
  • Effect - the attribute's effect (when it applies)

Pending Documentation

Backend features not yet given documentation:

  • Parser of KORE terms and definitions
  • Term representation of K terms
  • Hooked sorts and symbols
  • Substituting a substitution into the RHS of a rule
    • domain values
    • functions
    • variables
    • symbols
    • polymorphism
    • hooks
    • injection compaction
    • overload compaction
  • Pattern Matching / Unification of subject and LHS of rule
    • domain values
    • symbols
    • side conditions
    • and/or patterns
    • list patterns
    • nonlinear variables
    • map/set patterns
      • deterministic
      • nondeterministic
    • modulo injections
    • modulo overloads
  • Stepping
    • initialization
    • termination
  • Print kore terms
  • Equality/comparison of terms
  • Owise rules
  • Strategy #STUCK axiom
  • User substitution
    • binders
    • kvar

To get a complete list of hooks supported by K, you can run:

grep -P -R "(?<=[^-])hook\([^)]*\)" k-distribution/include/kframework/builtin/ \
     --include "*.k" -ho | \
sed 's/hook(//' | sed 's/)//' | sort | uniq | grep -v org.kframework

All of these hooks will also eventually need documentation.


  1. Except for in a very limited number of special cases from the
    K standard library. ↩︎

  2. The Maude documentation
    has an example in a context that's somewhat similar to K; discussion of
    ad-hoc overloading is not relevant. ↩︎

K Cheat Sheet

This is a quick reference of the most commonly used K tools.

kompile (--gen-bison-parser)? {file}                : generate parser, optionally with ahead of time
krun {file}                                         : interpret file
krun -cPGM='{string}'                               : interpret string
kast --output (kore | kast) (-e|{file})             : parse expression or file
kompile (--enable-search --backend haskell)? {file} : generate parser, enabling non-deterministic run
krun (--search-all)? {file}                         : interpret file, evaluating non-deterministic runs as well
foo-kompiled/parser_PGM {file}                      : ahead of time parse
kompile (--main-module)? (--syntax-module)? {file}  : generate parser for {file}.k {file}-syntax.k, explicitly state main modules
kparse <file> | kore-print -                        : parse and unparse a file
kompile {file} --enable-llvm-debug                  : generate debuggable output for {file}.k
krun {file} --debugger                              : debug K code
kprove {file}                                       : Verify specs in {file}

During GDB debugging session (see here for
LLDB breakpoint syntax):

break {file}:{linenum}                              : add a breakpoint to {file}'s {linenum} numbered line
k match {module}.{label} subject                    : investigate matching

K Tools

Here we document how to use some of the most commonly used K tools.

Minimizing Output

When one is working with kore-repl or the prover in general and looking at
specific configurations using config, sometimes the configurations can be huge.

One tool to help print configuration compactly is the pyk print utility:

pyk print

We are going to use --minimize option (which is actually used automatically
when printing with pyk). This will filter out many uninteresting cells for the
current config and make the result more compact.

Then, when invoking the prover, you can minimize your output by piping it into
the pyk print ... facility with arguments for controlling the output:

kprove --output json --definition DEFN ... \
    | jq .term                             \
    | pyk print DEFN /dev/stdin --omit-labels ... --keep-labels ...

You can also use this in the kore-repl more easily, by making a help script.
In your current directory, save a new script pykprint.sh:

#!/bin/bash

kast --input kore --output json --definition $1 /dev/stdin \
    | jq .term                                             \
    | pyk print $1 /dev/stdin --omit-labels $2

Now call config | bash pykprint.sh DEFN in Kore REPL to make the output
smaller.

The options you have to control the output are as follows:

  • --no-minimize: do not remove uninteresting cells.
  • --omit-cells: remove the selected cells from the output.
  • --keep-cells: keep only the selected cells in the output.

Note: Make sure that there is no whitespace around , in the omit list,
otherwise you'll get an error (, is a list separator, so this
requirement is strict).

Debugging

The LLVM Backend has support for integration with GDB. You can run the debugger
on a particular program by passing the --debugger flag to krun, or by
invoking the llvm backend interpreter directly. Below we provide a simple
tutorial to explain some of the basic commands supported by the LLVM backend.

LLDB Support

GDB is not well-supported on macOS, particularly on newer OS versions and Apple
Silicon ARM hardware. Consequently, if the --debugger option is passed to krun
on macOS, LLDB[^1] is launched instead of GDB. However, the K-specific debugger
scripts that GDB uses have not been ported to LLDB yet, and so the instructions
in the rest of this section will not work.

The K Definition

Here is a sample K definition we will use to demonstrate debugging
capabilities:

module TEST
  imports INT

  configuration <k> foo(5) </k>
  rule [test]: I:Int => I +Int 1 requires I <Int 10

  syntax Int ::= foo(Int) [function]
  rule foo(I) => 0 -Int I

endmodule

You should compile this definition with --backend llvm --enable-llvm-debug to
use the debugger most effectively.

Stepping

Important: When you first run krun with option --debugger, GDB / LLDB
will instruct you on how to modify ~/.gdbinit or ~/.lldbinit to enable
printing abstract syntax of K terms in the debugger. If you do not perform this
step, you can still use all the other features, but K terms will be printed as
their raw address in memory.

GDB will need the kompiled interpreter in its safe path in order to access the
pretty printing python script within it. A good way to do this would be to pick
a minimum top-level path that covers all of your kompiled semantics (ie. set auto-load safe-path ~/k-semantics). LLDB has slightly different security
policies that do not require fully-arbitrary code execution.

This section uses GDB syntax to demonstrate the debugging features. Please
refer to the GDB to LLDB command map on
macOS.

You can break before every step of execution is taken by setting a breakpoint
on the k_step function.

(gdb) break definition.kore:k_step
Breakpoint 1 at 0x25e340
(gdb) run
Breakpoint 1, 0x000000000025e340 in step (subject=`<generatedTop>{}`(`<k>{}`(`kseq{}`(`inj{Int{}, KItem{}}`(#token("0", "Int")),dotk{}(.KList))),`<generatedCounter>{}`(#token("0", "Int"))))
(gdb) continue
Continuing.

Breakpoint 1, 0x000000000025e340 in step (subject=`<generatedTop>{}`(`<k>{}`(`kseq{}`(`inj{Int{}, KItem{}}`(#token("1", "Int")),dotk{}(.KList))),`<generatedCounter>{}`(#token("0", "Int"))))
(gdb) continue 2
Will ignore next crossing of breakpoint 1.  Continuing.

Breakpoint 1, 0x000000000025e340 in step (subject=`<generatedTop>{}`(`<k>{}`(`kseq{}`(`inj{Int{}, KItem{}}`(#token("3", "Int")),dotk{}(.KList))),`<generatedCounter>{}`(#token("0", "Int"))))
(gdb)

Breaking on a specific rule

You can break when a rule is applied by giving the rule a rule label. If the
module name is TEST and the rule label is test, you can break when the rule
applies by setting a breakpoint on the TEST.test.rhs function:

(gdb) break TEST.test.rhs
Breakpoint 1 at 0x25e250: file /home/dwightguth/test/./test.k, line 4.
(gdb) run
Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4         rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)

Note that the substitution associated with that rule is visible in the
description of the frame.

You can also break when a side condition is applied using the TEST.test.sc
function:

(gdb) break TEST.test.sc
Breakpoint 1 at 0x25e230: file /home/dwightguth/test/./test.k, line 4.
(gdb) run
Breakpoint 1, TEST.test.sc (VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4         rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)

Note that every variable used in the side condition can have its value
inspected when stopped at this breakpoint, but other variables are not visible.

You can also break on a rule by its location:

(gdb) break test.k:4
Breakpoint 1 at 0x25e230: test.k:4. (2 locations)
(gdb) run
Breakpoint 1, TEST.test.sc (VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4         rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb) continue
Continuing.

Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4         rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb) continue
Continuing.

Breakpoint 1, TEST.test.sc (VarI=#token("1", "Int")) at /home/dwightguth/test/./test.k:4
4         rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)

Note that this sets a breakpoint at two locations: one on the side condition
and one on the right hand side. If the rule had no side condition, the first
would not be set. You can also view the locations of the breakpoints and
disable them individually:

(gdb) info breakpoint
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   <MULTIPLE>
        breakpoint already hit 3 times
1.1                         y     0x000000000025e230 in TEST.test.sc at /home/dwightguth/test/./test.k:4
1.2                         y     0x000000000025e250 in TEST.test.rhs at /home/dwightguth/test/./test.k:4
(gdb) disable 1.1
(gdb) continue
Continuing.

Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("1", "Int")) at /home/dwightguth/test/./test.k:4
4         rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb) continue
Continuing.

Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("2", "Int")) at /home/dwightguth/test/./test.k:4
4         rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)

Now only the breakpoint when the rule applies is enabled.

Breaking on a function

You can also break when a particular function in your semantics is invoked:

(gdb) info functions foo
All functions matching regular expression "foo":

File /home/dwightguth/test/./test.k:
struct __mpz_struct *Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int(struct __mpz_struct *);
(gdb) break Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int
Breakpoint 1 at 0x25e640: file /home/dwightguth/test/./test.k, line 6.
(gdb) run
Breakpoint 1, Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int (_1=#token("1", "Int")) at /home/dwightguth/test/./test.k:6
6         syntax Int ::= foo(Int) [function]
(gdb)

In this case, the variables have numbers instead of names because the names of
arguments in functions in K come from rules, and we are stopped before any
specific rule has applied. For example, _1 is the first argument to the
function.

You can also set a breakpoint in this location by setting it on the line
associated with its production:

(gdb) break test.k:6
Breakpoint 1 at 0x25e640: file /home/dwightguth/test/./test.k, line 6.
(gdb) run
Breakpoint 1, Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int (_1=#token("1", "Int")) at /home/dwightguth/test/./test.k:6
6         syntax Int ::= foo(Int) [function]

These two syntaxes are equivalent; use whichever is easier for you.

You can also view the stack of function applications:

(gdb) bt
#0  Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int (_1=#token("1", "Int")) at /home/dwightguth/test/./test.k:6
#1  0x000000000025e5f8 in apply_rule_111 (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList)) at /home/dwightguth/test/./test.k:9
#2  0x0000000000268a52 in take_steps ()
#3  0x000000000026b7b4 in main ()
(gdb)

Here we see that foo was invoked while applying the rule on line 9 of test.k,
and we also can see the substitution of that rule. If foo was evaluated while
evaluating another function, we would also be able to see the arguments of that
function as well, unless the function was tail recursive, in which case no
stack frame would exist once the tail call was performed.

Breaking on a set of rules or functions

Using rbreak <regex> you can set breakpoints on multiple functions.

  • rbreak Lbl - sets a breakpoint on all non hooked functions

  • rbreak Lbl.*TEST - sets a breakpoint on all functions from module TEST

  • rbreak hook_INT - sets a breakpoint on all hooks from module INT

Other debugger issues

  • <optimized out> try kompiling without -O1, -O2, or -O3.
  • (gdb) break definition.kore:break -> No source file named definition.kore.
    send --enable-llvm-debug to kompile in order to generate debug info symbols.

Profiling your K semantics

The first thing to be aware of is in order to get meaningful data,
you need to build the semantics and all of its dependencies with
optimizations enabled but without the frame pointer elimination
optimization
. For example, for EVM, this means rebuilding GMP, MPFR,
JEMalloc, Crypto++, SECP256K1, etc with the following exports.

export CFLAGS="-DNDEBUG -O2 -fno-omit-frame-pointer"
export CXXFLAGS="-DNDEBUG -O2 -fno-omit-frame-pointer"

You can skip this step, but if you do, any samples within these
libraries will not have correct stack trace information, which means
you will likely not get a meaningful set of data that will tell you
where the majority of time is really being spent. Don't worry about
rebuilding literally every single dependency though. Just focus on the
ones that you expect to take a non-negligible amount of runtime. You
will be able to tell if you haven't done enough later, and you can go
back and rebuild more. Once this is done, you then build K with
optimizations and debug info enabled, like so:

mvn package -Dproject.build.type="FastBuild"

Next, you build the semantics with optimizations and debug info
enabled (i.e., kompile -ccopt -O2 --iterated -ccopt -fno-omit-frame-pointer).

Once all this is done, you should be ready to profile your
application. Essentially, you should run whatever test suite you
usually run, but with perf record -g -- prefixed to the front. For
example, for KEVM it's the following command. (For best data, don't
run this step in parallel.)

perf record -g -- make test-conformance

Finally, you want to filter out just the samples that landed within
the llvm backend and view the report. For this, you need to know the
name of the binary that was generated by your build system. Normally
it is interpreter, but e.g. if you are building the web3 client for
kevm, it would be kevm-client. You will want to run the following
command.

perf report -g -c $binary_name

If all goes well, you should see a breakdown of where CPU time has
been spent executing the application. You will know that sufficient
time was spent rebuilding dependencies with the correct flags when the
total time reported by the main method is close to 100%. If it's not
close to 100%, this is probably because a decent amount of self time
was reported in stack traces that were not built with frame pointers
enabled, meaning that perf was unable to walk the stack. You will have
to go back, rebuild the appropriate libraries, and then record your
trace again.

Your ultimate goal is to identify the hotspots that take the most
time, and make them execute faster. Entries like step and
step_1234 like functions refer to the cost of matching. An entry
like side_condition_1234 is a side condition and apply_rule_1234
is constructing the rhs of a rule. You can convert from this rule
ordinal to a location using the llvm-kompile-compute-loc script in
the bin folder of the llvm backend repo. For example,

llvm-kompile-compute-loc 5868 evm-semantics/.build/defn/llvm/driver-kompiled

spits out the following text.

Line: 18529
/home/dwightguth/evm-semantics/./.build/defn/llvm/driver.k:493:10

This is the line of definition.kore that the axiom appears on as
well as the original location of the rule in the K semantics. You can
use this information to figure out which rules and functions are
causing the most time and optimize them to be more efficient.

Running tests - kserver

The kserver is a front-end tool based on Nailgun
which helps to reduce the startup time of the JVM. Calling kserver in a terminal
window will wait for all kompile/kprove calls and force them to run in the same process
and share the same threads. This also reduces the thread contention significantly. kompile
uses all the threads available to do rule parsing. Another benefit is that it saves caches,
and each time you call kprove/kast, you can access those directly w/o extra disk usage.
Running the regression-new integration tests on a powerful machine (32 threads) takes 8m,
with the kserver active it takes 2m. You can start the kserver in two ways.

  • blocking: call kserver in the command line. Close it after you are done testing. Useful for quick testing.
  • non-blocking: call spawn-kserver <log.flie> and close it with stop-kserver - this is used for automation on CI

Because we reuse caches, you should stop and restart the server between runs.
The Nailgun implementation hasn't been updated in the last 3-5 years, and it's not compatible with Java 18 onwards.

K Builtins

The K Builtins (also referred to as the K Prelude or the K Standard Library)
consists of several files which contain definitions that make working with K
simpler. These files can be found under include/kframework/builtin in your K
installation directory, and can be imported with requires "FILENAME" (without
the path prefix).

  • domains: Basic datatypes which are universally useful.
  • kast: Representation of K internal data-structures (not to be
    included in normal definitions).
  • prelude: Automatically included into every K definition.
  • ffi: FFI interface for calling out to native C code from K.
  • json: JSON datatype and parsers/unparsers for JSON strings.
  • rat: Rational number representation.
  • substitution: Hooked implementation of capture-aware
    sustitution for K definitions.
  • unification: Hooked implementation of unification
    exposed directly to K definitions.

Basic Builtin Types in K

A major piece of the K prelude consists of a series of modules that contain
implementations of basic data types and language features in K. You do not need
to require this file yourself; it is required automatically in every K
definition unless --no-prelude is passed to kompile. K may not work correctly
if some of these modules do not exist or do not declare certain functions.

Note that some functions in the K prelude functions are not total, that is,
they are not defined on all possible input values. When you invoke such a
function on an undefined input, the behavior is undefined. In particular, when
this happens, interpreters generated by the K LLVM backend may crash.

requires "kast.md"

Default Modules

K declares certain modules that contain most of the builtins you usually want
when defining a language in K. In particular, this includes integers, booleans,
strings, identifiers, I/O, lists, maps, and sets. The DOMAINS-SYNTAX module
is designed to be imported by the syntax module of the language and contains
only the program-level syntax of identifiers, integers, booleans, and strings.
The DOMAINS module contains the rest of the syntax, including builtin
functions over those and the remaining types.

Note that not all modules are included in DOMAINS. A few less-common modules
are not, including ARRAY, COLLECTIONS, FLOAT, STRING-BUFFER, BYTES,
K-REFLECTION, MINT.

module DOMAINS-SYNTAX
  imports SORT-K
  imports ID-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX
  imports STRING-SYNTAX
endmodule

module DOMAINS
  imports DOMAINS-SYNTAX
  imports INT
  imports BOOL
  imports STRING
  imports BASIC-K
  imports LIST
  imports K-IO
  imports MAP
  imports SET
  imports ID
  imports K-EQUAL
endmodule

Arrays

Provided here is an implementation for fixed-sized, contiguous maps from Int
to KItem. In some previous versions of K, the Array type was a builtin type
backed by mutable arrays of objects. However, in modern K, the Array type is
implemented by means of the List type; users should not access this interface
directly and should instead make only of the functions listed below. Users of
this module should import only the ARRAY module.

module ARRAY-SYNTAX
  imports private LIST

  syntax Array

Array lookup

You can look up an element in an Array by its index in O(log(N)) time. Note
that the base of the logarithm is a relatively high number and thus the time is
effectively constant.

  syntax KItem ::= Array "[" Int "]" [function]

Array update

You can create a new Array with a new value for a key in O(log(N)) time, or
effectively constant.

  syntax Array ::= Array "[" key: Int "<-" value: KItem "]" [function, symbol(_[_<-_])]

Array reset

You can create a new Array where a particular key is reset to its default
value in O(log(N)) time, or effectively constant.

  syntax Array ::= Array "[" Int "<-" "undef" "]" [function]

Multiple array update

You can create a new Array from a List L of size N where the N
elements starting at index are replaced with the contents of L, in
O(N*log(K)) time (where K is the size of the array), or effectively linear.
Having index + N > K yields an exception.

  syntax Array ::= updateArray(Array, index: Int, List) [function]

Array fill

You can create a new Array where the length elements starting at index
are replaced with value, in O(length*log(N)) time, or effectively linear.

  syntax Array ::= fillArray(Array, index: Int, length: Int, value: KItem) [function]

Array range check

You can test whether an integer is within the bounds of an array in O(1) time.

  syntax Bool ::= Int "in_keys" "(" Array ")" [function, total]
endmodule

module ARRAY-IN-K [private]
  imports public ARRAY-SYNTAX
  imports private LIST
  imports private K-EQUAL
  imports private INT
  imports private BOOL

Array creation

You can create an array with length elements where each element is
initialized to value in O(1) time. Note that the array is stored in a manner
where only the highest element that is actually modified is given a value
in its internal representation, which means that subsequent array operations
may incur a one-time O(N) resizing cost, possibly amortized across multiple
operations.

  syntax Array ::= makeArray(length: Int, value: KItem) [function, public]

Implementation of Arrays

The remainder of this section consists of an implementation in K of the
operations listed above. Users of the ARRAY module should not make use
of any of the syntax defined in any of these modules.

  syntax Array ::= arr(List, Int, KItem)

  rule makeArray(I::Int, D::KItem) => arr(.List, I, D)

  rule arr(L::List, _, _       ) [ IDX::Int ] => L[IDX] requires 0 <=Int IDX andBool IDX  <Int size(L)
  rule arr(_      , _, D::KItem) [ _        ] => D      [owise]

  syntax List ::= ensureOffsetList(List, Int, KItem) [function]
  rule ensureOffsetList(L::List, IDX::Int, D::KItem) => L makeList(IDX +Int 1 -Int size(L), D) requires         IDX >=Int size(L)
  rule ensureOffsetList(L::List, IDX::Int, _::KItem) => L                                      requires notBool IDX >=Int size(L)

  rule arr(L::List, I::Int, D::KItem) [ IDX::Int <- VAL::KItem ] => arr(ensureOffsetList(L, IDX, D) [ IDX <- VAL ], I, D)

  rule arr(L::List, I::Int, D::KItem) [ IDX::Int <- undef ] => arr(L, I, D) [ IDX <- D ]

  rule updateArray(arr(L::List, I::Int, D::KItem), IDX::Int, L2::List) => arr(updateList(ensureOffsetList(L, IDX +Int size(L2) -Int 1, D), IDX, L2), I, D)

  rule fillArray(arr(L::List, I::Int, D::KItem), IDX::Int, LEN::Int, VAL::KItem) => arr(fillList(ensureOffsetList(L, IDX +Int LEN -Int 1, D), IDX, LEN, VAL), I, D)

  rule IDX::Int in_keys(arr(_, I::Int, _)) => IDX >=Int 0 andBool IDX <Int I
endmodule

module ARRAY-SYMBOLIC [symbolic]
  imports ARRAY-IN-K
endmodule

module ARRAY-KORE
  imports ARRAY-IN-K
endmodule

module ARRAY
  imports ARRAY-SYMBOLIC
  imports ARRAY-KORE
endmodule

Maps

Provided here is the syntax of an implementation of immutable, associative,
commutative maps from KItem to KItem. This type is hooked to an
implementation of maps provided by the backend. For more information on
matching on maps and allowable patterns for doing so, refer to K's
user documentation.

module MAP
  imports private BOOL-SYNTAX
  imports private INT-SYNTAX
  imports private LIST
  imports private SET

  syntax Map [hook(MAP.Map)]

Map concatenation

The Map sort represents a generalized associative array. Each key can be
paired with an arbitrary value, and can be used to reference its associated
value. Multiple bindings for the same key are not allowed.

You can construct a new Map consisting of key/value pairs of two Maps. The
result is #False if the maps have keys in common (in particular, this will
yield an exception during concrete execution). This operation is O(Nlog(M))
where N is the size of the smaller map, when it appears on the right hand side.
When it appears on the left hand side and all variables are bound, it is
O(N
log(M)) where M is the size of the map it is matching and N is the number
of elements being matched. When it appears on the left hand side containing
variables not bound elsewhere in the term, it is O(N^K) where N is the size of
the map it is matching and K is the number of unbound keys being matched. In
other words, one unbound variable is linear, two is quadratic, three is cubic,
etc.

  syntax Map ::= Map Map                        [left, function, hook(MAP.concat), symbol(_Map_), assoc, comm, unit(.Map), element(_|->_), index(0), format(%1%n%2)]

Map unit

The map with zero elements is represented by .Map.

  syntax Map ::= ".Map"                         [function, total, hook(MAP.unit), symbol(.Map)]

Map elements

An element of a Map is constructed via the |-> operator. The key is on the
left and the value is on the right.

  syntax Map ::= KItem "|->" KItem                      [function, total, hook(MAP.element), symbol(_|->_), injective]

  syntax priority _|->_ > _Map_ .Map
  syntax non-assoc _|->_

Map lookup

You can look up the value associated with the key of a map in O(log(N)) time.
Note that the base of the logarithm is a relatively high number and thus the
time is effectively constant. The value is #False if the key is not in the
map (in particular, this will yield an exception during concrete execution).

  syntax KItem ::= Map "[" KItem "]"                    [function, hook(MAP.lookup), symbol(Map:lookup)]

Map lookup with default

You can also look up the value associated with the key of a map using a
total function that assigns a specific default value if the key is not present
in the map. This operation is also O(log(N)), or effectively constant.

  syntax KItem ::= Map "[" KItem "]" "orDefault" KItem      [function, total, hook(MAP.lookupOrDefault), symbol(Map:lookupOrDefault)]

Map update

You can insert a key/value pair into a map in O(log(N)) time, or effectively
constant.

  syntax Map ::= Map "[" key: KItem "<-" value: KItem "]"           [function, total, symbol(Map:update), hook(MAP.update), prefer]

Map delete

You can remove a key/value pair from a map via its key in O(log(N)) time, or
effectively constant.

  syntax Map ::= Map "[" KItem "<-" "undef" "]"     [function, total, hook(MAP.remove), symbol(_[_<-undef])]

Map difference

You can remove the key/value pairs in a map that are present in another map in
O(N*log(M)) time (where M is the size of the first map and N is the size of the
second), or effectively linear. Note that only keys whose value is the same
in both maps are removed. To remove all the keys in one map from another map,
you can say removeAll(M1, keys(M2)).

  syntax Map ::= Map "-Map" Map                 [function, total, hook(MAP.difference)]

Multiple map update

You can update a map by adding all the key/value pairs in the second map in
O(N*log(M)) time (where M is the size of the first map and N is the size of the
second map), or effectively linear. If any keys are present in both maps, the
value from the second map overwrites the value in the first. This function is
total, which is distinct from map concatenation, a partial function only
defined on maps with disjoint keys.

  syntax Map ::= updateMap(Map, Map)            [function, total, hook(MAP.updateAll)]

Multiple map removal

You can remove a Set of keys from a map in O(N*log(M)) time (where M is the
size of the Map and N is the size of the Set), or effectively linear.

  syntax Map ::= removeAll(Map, Set)            [function, total, hook(MAP.removeAll)]

Map keys (as Set)

You can get a Set of all the keys in a Map in O(N) time.

  syntax Set ::= keys(Map)                      [function, total, hook(MAP.keys)]

Map keys (as List)

You can get a List of all the keys in a Map in O(N) time.

  syntax List ::= "keys_list" "(" Map ")"       [function, hook(MAP.keys_list)]

Map key membership

You can check whether a key is present in a map in O(1) time.

  syntax Bool ::= KItem "in_keys" "(" Map ")"       [function, total, hook(MAP.in_keys)]

Map values (as List)

You can get a List of all the values in a map in O(N) time.

  syntax List ::= values(Map)                   [function, hook(MAP.values)]

Map size

You can get the number of key/value pairs in a map in O(1) time.

  syntax Int ::= size(Map)                      [function, total, hook(MAP.size), symbol(sizeMap)]

Map inclusion

You can determine whether a Map is a strict subset of another Map in O(N)
time (where N is the size of the first map). Only keys that are bound to the
same value are considered equal.

  syntax Bool ::= Map "<=Map" Map               [function, total, hook(MAP.inclusion)]

Map choice

You can get an arbitrarily chosen key of a Map in O(1) time. The same key
will always be returned for the same map, but no guarantee is given that two
different maps will return the same element, even if they are similar.

  syntax KItem ::= choice(Map)                      [function, hook(MAP.choice), symbol(Map:choice)]

Implementation of Maps

The remainder of this section contains lemmas used by the Java and Haskell
backend to simplify expressions of sort Map. They do not affect the semantics
of maps, merely describing additional rules that the backend can use to
simplify terms.

endmodule

module MAP-KORE-SYMBOLIC [symbolic,haskell]
  imports MAP
  imports private K-EQUAL
  imports private BOOL

  rule #Ceil(@M:Map [@K:KItem]) => {(@K in_keys(@M)) #Equals true} #And #Ceil(@M) #And #Ceil(@K) [simplification]

  // Symbolic update

  // Adding the definedness condition `notBool (K in_keys(M))` in the ensures clause of the following rule would be redundant
  // because K also appears in the rhs, preserving the case when it's #Bottom.
  rule (K |-> _ M:Map) [ K <- V ] => (K |-> V M) [simplification]
  rule M:Map [ K <- V ] => (K |-> V M) requires notBool (K in_keys(M)) [simplification]
  rule M:Map [ K <- _ ] [ K <- V ] => M [ K <- V ] [simplification]
  // Adding the definedness condition `notBool (K1 in_keys(M))` in the ensures clause of the following rule would be redundant
  // because K1 also appears in the rhs, preserving the case when it's #Bottom.
  rule (K1 |-> V1 M:Map) [ K2 <- V2 ] => (K1 |-> V1 (M [ K2 <- V2 ])) requires K1 =/=K K2 [simplification]

  // Symbolic remove
  rule (K |-> _ M:Map) [ K <- undef ] => M ensures notBool (K in_keys(M)) [simplification]
  rule M:Map [ K <- undef ] => M requires notBool (K in_keys(M)) [simplification]
  // Adding the definedness condition `notBool (K1 in_keys(M))` in the ensures clause of the following rule would be redundant
  // because K1 also appears in the rhs, preserving the case when it's #Bottom.
  rule (K1 |-> V1 M:Map) [ K2 <- undef ] => (K1 |-> V1 (M [ K2 <- undef ])) requires K1 =/=K K2 [simplification]

  // Symbolic lookup
  rule (K  |->  V M:Map) [ K ]  => V ensures notBool (K in_keys(M)) [simplification]
  rule (K1 |-> _V M:Map) [ K2 ] => M [K2] requires K1 =/=K K2 ensures notBool (K1 in_keys(M)) [simplification]
  rule (_MAP:Map [ K  <-  V1 ]) [ K ]  => V1 [simplification]
  rule ( MAP:Map [ K1 <- _V1 ]) [ K2 ] => MAP [ K2 ] requires K1 =/=K K2 [simplification]

  rule (K  |->  V M:Map) [  K ] orDefault _ => V ensures notBool (K in_keys(M)) [simplification]
  rule (K1 |-> _V M:Map) [ K2 ] orDefault D => M [K2] orDefault D requires K1 =/=K K2 ensures notBool (K1 in_keys(M)) [simplification]
  rule (_MAP:Map [ K  <-  V1 ]) [ K ] orDefault _ => V1 [simplification]
  rule ( MAP:Map [ K1 <- _V1 ]) [ K2 ] orDefault D => MAP [ K2 ] orDefault D requires K1 =/=K K2 [simplification]
  rule .Map [ _ ] orDefault D => D [simplification]

  // Symbolic in_keys
  rule K in_keys(_M [ K <- undef ]) => false [simplification]
  rule K in_keys(_M [ K <- _ ]) => true [simplification]
  rule K1 in_keys(M [ K2 <- _ ]) => true requires K1 ==K K2 orBool K1 in_keys(M) [simplification]
  rule K1 in_keys(M [ K2 <- _ ]) => K1 in_keys(M) requires K1 =/=K K2 [simplification]

  rule {false #Equals @Key in_keys(.Map)} => #Ceil(@Key) [simplification]
  rule {@Key in_keys(.Map) #Equals false} => #Ceil(@Key) [simplification]
  rule {false #Equals @Key in_keys(Key' |-> Val @M)} => #Ceil(@Key) #And #Ceil(Key' |-> Val @M) #And #Not({@Key #Equals Key'}) #And {false #Equals @Key in_keys(@M)} [simplification]
  rule {@Key in_keys(Key' |-> Val @M) #Equals false} => #Ceil(@Key) #And #Ceil(Key' |-> Val @M) #And #Not({@Key #Equals Key'}) #And {@Key in_keys(@M) #Equals false} [simplification]

/*
// The rule below is automatically generated by the frontend for every sort
// hooked to MAP.Map. It is left here to serve as documentation.

  rule #Ceil(@M:Map (@K:KItem |-> @V:KItem)) => {(@K in_keys(@M)) #Equals false} #And #Ceil(@M) #And #Ceil(@K) #And #Ceil(@V)
    [simplification]
*/
endmodule

module MAP-SYMBOLIC
  imports MAP-KORE-SYMBOLIC
endmodule

Range Maps

Provided here is the syntax of an implementation of immutable, associative,
commutative range maps from Int to KItem. This type is hooked to an
implementation of range maps provided by the LLVM backend.
Currently, this type is not supported by other backends.
Although the underlying range map data structure supports any key sort, the
current implementation by the backend only supports Int keys due to
limitations of the underlying ordering function.

module RANGEMAP
  imports private BOOL-SYNTAX
  imports private INT-SYNTAX
  imports private LIST
  imports private SET

Range, bounded inclusively below and exclusively above.

  syntax Range ::= "[" KItem "," KItem ")"    [symbol(RangeMap:Range)]

  syntax RangeMap [hook(RANGEMAP.RangeMap)]

Range map concatenation

The RangeMap sort represents a map whose keys are stored as ranges, bounded
inclusively below and exclusively above. Contiguous or overlapping ranges that
map to the same value are merged into a single range.

You can construct a new RangeMap consisting of range/value pairs of two
RangeMaps. If the RangeMaps have overlapping ranges an exception will be
thrown during concrete execution. This operation is O(N*log(M)) (where N is
the size of the smaller map and M is the size of the larger map).

  syntax RangeMap ::= RangeMap RangeMap                        [left, function, hook(RANGEMAP.concat), symbol(_RangeMap_), assoc, comm, unit(.RangeMap), element(_r|->_), index(0), format(%1%n%2)]

Range map unit

The RangeMap with zero elements is represented by .RangeMap.

  syntax RangeMap ::= ".RangeMap"                         [function, total, hook(RANGEMAP.unit), symbol(.RangeMap)]

Range map elements

An element of a RangeMap is constructed via the r|-> operator. The range
of keys is on the left, and the value is on the right.

  syntax RangeMap ::= Range "r|->" KItem                      [function, hook(RANGEMAP.elementRng), symbol(_r|->_), injective]

  syntax priority _r|->_ > _RangeMap_ .RangeMap
  syntax non-assoc _r|->_

Range map lookup

You can look up the value associated with a key of a RangeMap in O(log(N))
time (where N is the size of the RangeMap). This will yield an exception
during concrete execution if the key is not in the range map.

  syntax KItem ::= RangeMap "[" KItem "]"                    [function, hook(RANGEMAP.lookup), symbol(RangeMap:lookup)]

Range map lookup with default

You can also look up the value associated with a key of a RangeMap using a
total function that assigns a specific default value if the key is not present
in the RangeMap. This operation is also O(log(N)) (where N is the size of
the range map).

  syntax KItem ::= RangeMap "[" KItem "]" "orDefault" KItem      [function, total, hook(RANGEMAP.lookupOrDefault), symbol(RangeMap:lookupOrDefault)]

Range map lookup for range of key

You can look up for the range that a key of a RangeMap is stored in in
O(log(N)) time (where N is the size of the RangeMap). This will yield an
exception during concrete execution if the key is not in the range map.

  syntax Range ::= "find_range" "(" RangeMap "," KItem ")"                    [function, hook(RANGEMAP.find_range), symbol(RangeMap:find_range)]

Range map update

You can insert a range/value pair into a RangeMap in O(log(N)) time (where N
is the size of the RangeMap). Any ranges adjacent to or overlapping with the
range to be inserted will be updated accordingly.

  syntax RangeMap ::= RangeMap "[" keyRange: Range "<-" value: KItem "]"           [function, symbol(RangeMap:update), hook(RANGEMAP.updateRng), prefer]

Range map delete

You can remove a range/value pair from a RangeMap in O(log(N)) time (where N
is the size of the RangeMap). If all or any part of the range is present in
the range map, it will be removed.

  syntax RangeMap ::= RangeMap "[" Range "<-" "undef" "]"     [function, hook(RANGEMAP.removeRng), symbol(_r[_<-undef])]

Range map difference

You can remove the range/value pairs in a RangeMap that are also present in
another RangeMap in O(max{M,N}*log(M)) time (where M is the size of the
first RangeMap and N is the size of the second RangeMap). Note that only
the parts of overlapping ranges whose value is the same in both range maps
will be removed.

  syntax RangeMap ::= RangeMap "-RangeMap" RangeMap                 [function, total, hook(RANGEMAP.difference)]

Multiple range map update

You can update a RangeMap by adding all the range/value pairs in the second
RangeMap in O(N*log(M+N)) time (where M is the size of the first RangeMap
and N is the size of the second RangeMap). If any ranges are overlapping,
the value from the second range map overwrites the value in the first for the
parts where ranges are overlapping. This function is total, which is distinct
from range map concatenation, a partial function only defined on range maps
with non overlapping ranges.

  syntax RangeMap ::= updateRangeMap(RangeMap, RangeMap)            [function, total, hook(RANGEMAP.updateAll)]

Multiple range map removal

You can remove a Set of ranges from a RangeMap in O(N*log(M)) time (where
M is the size of the RangeMap and N is the size of the Set). For every
range in the set, all or any part of it that is present in the range map will
be removed.

  syntax RangeMap ::= removeAll(RangeMap, Set)            [function, hook(RANGEMAP.removeAll)]

Range map keys (as Set)

You can get a Set of all the ranges in a RangeMap in O(N) time (where N
is the size of the RangeMap).

  syntax Set ::= keys(RangeMap)                      [function, total, hook(RANGEMAP.keys)]

Range map keys (as List)

You can get a List of all the ranges in a RangeMap in O(N) time (where N
is the size of the RangeMap).

  syntax List ::= "keys_list" "(" RangeMap ")"       [function, hook(RANGEMAP.keys_list)]

Range map key membership

You can check whether a key is present in a RangeMap in O(log(N)) time (where
N is the size of the RangeMap).

  syntax Bool ::= KItem "in_keys" "(" RangeMap ")"       [function, total, hook(RANGEMAP.in_keys)]

Range map values (as List)

You can get a List of all values in a RangeMap in O(N) time (where N is the
size of the RangeMap).

  syntax List ::= values(RangeMap)                   [function, hook(RANGEMAP.values)]

Range map size

You can get the number of range/value pairs in a RangeMap in O(1) time.

  syntax Int ::= size(RangeMap)                      [function, total, hook(RANGEMAP.size), symbol(sizeRangeMap)]

Range map inclusion

You can determine whether a RangeMap is a strict subset of another RangeMap
in O(M+N) time (where M is the size of the first RangeMap and N is the size
of the second RangeMap). Only keys within equal or overlapping ranges that
are bound to the same value are considered equal.

  syntax Bool ::= RangeMap "<=RangeMap" RangeMap               [function, total, hook(RANGEMAP.inclusion)]

Range map choice

You can get an arbitrarily chosen key of a RangeMap in O(1) time. The same
key will always be returned for the same range map, but no guarantee is given
that two different range maps will return the same element, even if they are
similar.

  syntax KItem ::= choice(RangeMap)                      [function, hook(RANGEMAP.choice), symbol(RangeMap:choice)]
endmodule

Sets

Provided here is the syntax of an implementation of immutable, associative,
commutative sets of KItem. This type is hooked to an implementation of sets
provided by the backend. For more information on matching on sets and allowable
patterns for doing so, refer to K's
user documentation.

module SET
  imports private INT-SYNTAX
  imports private BASIC-K

  syntax Set [hook(SET.Set)]

Set concatenation

The Set sort represents a mathematical set (A collection of unique items).
The sets are nilpotent, i.e., the concatenation of two sets containing elements
in common is #False (note however, this may be silently allowed during
concrete execution). If you intend to add an element to a set that might
already be present in the set, use the |Set operator instead.

The concatenation operator is O(Nlog(M)) where N is the size of the smaller
set, when it appears on the right hand side. When it appears on the left hand
side and all variables are bound, it is O(N
log(M)) where M is the size of the
set it is matching and N is the number of elements being matched. When it
appears on the left hand side containing variables not bound elsewhere in the
term, it is O(N^K) where N is the size of the set it is matching and K is the
number of unbound keys being mached. In other words, one unbound variable is
linear, two is quadratic, three is cubic, etc.

  syntax Set ::= Set Set                  [left, function, hook(SET.concat), symbol(_Set_), assoc, comm, unit(.Set), idem, element(SetItem), format(%1%n%2)]

Set unit

The set with zero elements is represented by .Set.

  syntax Set ::= ".Set"                   [function, total, hook(SET.unit), symbol(.Set)]

Set elements

An element of a Set is constructed via the SetItem operator.

  syntax Set ::= SetItem(KItem)               [function, total, hook(SET.element), symbol(SetItem), injective]

Set union

You can compute the union of two sets in O(N*log(M)) time (Where N is the size
of the smaller set). Note that the base of the logarithm is a relatively high
number and thus the time is effectively linear. The union consists of all the
elements present in either set.

  syntax Set ::= Set "|Set" Set              [left, function, total, hook(SET.union), comm]
  rule S1:Set |Set S2:Set => S1 (S2 -Set S1) [concrete]

Set intersection

You can compute the intersection of two sets in O(N*log(M)) time (where N
is the size of the smaller set), or effectively linear. The intersection
consists of all the elements present in both sets.

  syntax Set ::= intersectSet(Set, Set)   [function, total, hook(SET.intersection), comm]

Set complement

You can compute the relative complement of two sets in O(N*log(M)) time (where
N is the size of the second set), or effectively linear. This is the set of
elements in the first set that are not present in the second set.

  syntax Set ::= Set "-Set" Set           [function, total, hook(SET.difference), symbol(Set:difference)]

Set membership

You can compute whether an element is a member of a set in O(1) time.

  syntax Bool ::= KItem "in" Set              [function, total, hook(SET.in), symbol(Set:in)]

Set inclusion

You can determine whether a Set is a strict subset of another Set in O(N)
time (where N is the size of the first set).

  syntax Bool ::= Set "<=Set" Set         [function, total, hook(SET.inclusion)]

Set size

You can get the number of elements (the cardinality) of a set in O(1) time.

  syntax Int ::= size(Set)                [function, total, hook(SET.size)]

Set choice

You can get an arbitrarily chosen element of a Set in O(1) time. The same
element will always be returned for the same set, but no guarantee is given
that two different sets will return the same element, even if they are similar.

  syntax KItem ::= choice(Set)                [function, hook(SET.choice), symbol(Set:choice)]
endmodule

Implementation of Sets

The following lemmas are simplifications that the Haskell backend can
apply to simplify expressions of sort Set.

module SET-KORE-SYMBOLIC [symbolic,haskell]
  imports SET
  imports private K-EQUAL
  imports private BOOL

  //Temporarly rule for #Ceil simplification, should be generated in front-end

// Matching for this version not implemented.
  // rule #Ceil(@S1:Set @S2:Set) =>
  //        {intersectSet(@S1, @S2) #Equals .Set} #And #Ceil(@S1) #And #Ceil(@S2)
  //   [simplification]

//simpler version
  rule #Ceil(@S:Set SetItem(@E:KItem)) =>
         {(@E in @S) #Equals false} #And #Ceil(@S) #And #Ceil(@E)
    [simplification]

  // -Set simplifications
  rule S              -Set .Set           => S          [simplification]
  rule .Set           -Set  _             => .Set       [simplification]
  rule SetItem(X)     -Set (S SetItem(X)) => .Set
                               ensures notBool (X in S) [simplification]
  rule S              -Set (S SetItem(X)) => .Set
                               ensures notBool (X in S) [simplification]
  rule (S SetItem(X)) -Set S              => SetItem(X)
                               ensures notBool (X in S) [simplification]
  rule (S SetItem(X)) -Set SetItem(X)     => S
                               ensures notBool (X in S) [simplification]
  // rule SetItem(X)     -Set S              => SetItem(X)
  //                            requires notBool (X in S)  [simplification]
  // rule (S1 SetItem(X)) -Set (S2 SetItem(X))  => S1 -Set S2
  //                             ensures notBool (X in S1)
  //                             andBool notBool (X in S2) [simplification]



  // |Set simplifications
  rule S    |Set .Set => S    [simplification, comm]
  rule S    |Set S    => S    [simplification]

  rule (S SetItem(X)) |Set SetItem(X) => S SetItem(X)
                             ensures notBool (X in S) [simplification, comm]
  // Currently disabled, see runtimeverification/haskell-backend#3301
  // rule (S SetItem(X)) |Set S          => S SetItem(X)
  //                            ensures notBool (X in S) [simplification, comm]

  // intersectSet simplifications
  rule intersectSet(.Set, _   ) => .Set    [simplification, comm]
  rule intersectSet( S  , S   ) =>  S      [simplification]

  rule intersectSet( S SetItem(X), SetItem(X))     => SetItem(X)
                                                        ensures notBool (X in S)      [simplification, comm]
  // Currently disabled, see runtimeverification/haskell-backend#3294
  // rule intersectSet( S SetItem(X) , S)             => S ensures notBool (X in S)      [simplification, comm]
  rule intersectSet( S1 SetItem(X), S2 SetItem(X)) => intersectSet(S1, S2) SetItem(X)
                                                        ensures notBool (X in S1)
                                                        andBool notBool (X in S2)     [simplification]

  // membership simplifications
  rule _E in .Set           => false   [simplification]
  rule E  in (S SetItem(E)) => true
              ensures notBool (E in S) [simplification]

// These two rules would be sound but impose a giant overhead on `in` evaluation:
  // rule E1 in (S SetItem(E2)) => true requires E1 in S
  //                                 ensures notBool (E2 in S) [simplification]
  // rule E1 in (S SetItem(E2)) => E1 in S requires E1 =/=K E2
  //                                 ensures notBool (E2 in S) [simplification]

  rule X in ((SetItem(X) S) |Set  _            ) => true
                                    ensures notBool (X in S) [simplification]
  rule X in ( _             |Set (SetItem(X) S)) => true
                                    ensures notBool (X in S) [simplification]

endmodule

module SET-SYMBOLIC
  imports SET-KORE-SYMBOLIC
endmodule

Lists

Provided here is the syntax of an implementation of immutable, associative
lists of KItem. This type is hooked to an implementation of lists provided
by the backend. For more information on matching on lists and allowable
patterns for doing so, refer to K's
user documentation.

module LIST
  imports private INT-SYNTAX
  imports private BASIC-K

  syntax List [hook(LIST.List)]

List concatenation

The List sort is an ordered collection that may contain duplicate elements.
They are backed by relaxed radix balanced trees, which means that they support
efficiently adding elements to both sides of the list, concatenating two lists,
indexing, and updating elements.

The concatenation operator is O(log(N)) (where N is the size of the longer
list) when it appears on the right hand side. When it appears on the left hand
side, it is O(N), where N is the number of elements matched on the front and
back of the list.

  syntax List ::= List List               [left, function, total, hook(LIST.concat), symbol(_List_), smtlib(smt_seq_concat), assoc, unit(.List), element(ListItem), update(List:set), format(%1%n%2)]

List unit

The list with zero elements is represented by .List.

  syntax List ::= ".List"                 [function, total, hook(LIST.unit), symbol(.List), smtlib(smt_seq_nil)]

List elements

An element of a List is constucted via the ListItem operator.

  syntax List ::= ListItem(KItem)             [function, total, hook(LIST.element), symbol(ListItem), smtlib(smt_seq_elem)]

List prepend

An element can be added to the front of a List using the pushList operator.

  syntax List ::= pushList(KItem, List)       [function, total, hook(LIST.push), symbol(pushList)]
  rule pushList(K::KItem, L1::List) => ListItem(K) L1

List indexing

You can get an element of a list by its integer offset in O(log(N)) time, or
effectively constant. Positive indices are 0-indexed from the beginning of the
list, and negative indices are -1-indexed from the end of the list. In other
words, 0 is the first element and -1 is the last element.

  syntax KItem ::= List "[" Int "]"           [function, hook(LIST.get), symbol(List:get)]

List update

You can create a new List with a new value at a particular index in
O(log(N)) time, or effectively constant.

  syntax List ::= List "[" index: Int "<-" value: KItem "]" [function, hook(LIST.update), symbol(List:set)]

List of identical elements

You can create a list with length elements, each containing value, in O(N)
time.

  syntax List ::= makeList(length: Int, value: KItem) [function, hook(LIST.make)]

Multiple list update

You can create a new List which is equal to dest except the N elements
starting at index are replaced with the contents of src in O(N*log(K)) time
(where K is the size of destand N is the size of src), or effectively linear. Having index + N > K yields an exception.

  syntax List ::= updateList(dest: List, index: Int, src: List) [function, hook(LIST.updateAll)]

List fill

You can create a new List where the length elements starting at index
are replaced with value, in O(length*log(N)) time, or effectively linear.

  syntax List ::= fillList(List, index: Int, length: Int, value: KItem) [function, hook(LIST.fill)]

List slicing

You can compute a new List by removing fromFront elements from the front
of the list and fromBack elements from the back of the list in
O((fromFront+fromBack)*log(N)) time, or effectively linear.

  syntax List ::= range(List, fromFront: Int, fromBack: Int)   [function, hook(LIST.range), symbol(List:range)]

List membership

You can compute whether an element is in a list in O(N) time. For repeated
comparisons, it is much better to first convert to a set using List2Set.

  syntax Bool ::= KItem "in" List             [function, total, hook(LIST.in), symbol(_inList_)]

List size

You can get the number of elements of a list in O(1) time.

  syntax Int ::= size(List)               [function, total, hook(LIST.size), symbol(sizeList), smtlib(smt_seq_len)]
endmodule

Collection Conversions

It is possible to convert from a List to a Set or from a Set to a list.
Converting from a List to a Set and back will not provide the same list;
duplicates will have been removed and the list may be reordered. Converting
from a Set to a List and back will generate the same set.

Note that because sets are unordered and lists are ordered, converting from a
Set to a List will generate some arbitrary ordering of elements, which may
be different from the natural ordering you might assume, or may not. Two
equal sets are guaranteed to generate the same ordering, but no guarantee is
otherwise provided about what the ordering will be. In particular, adding an
element to a set may completely reorder the elements already in the set, when
it is converted to a list.

module COLLECTIONS
  imports LIST
  imports SET
  imports MAP

  syntax List ::= Set2List(Set) [function, total, hook(SET.set2list)]
  syntax Set ::= List2Set(List) [function, total, hook(SET.list2set)]

endmodule

Booleans

Provided here is the syntax of an implementation of boolean algebra in K.
This type is hooked to an implementation of booleans provided by the backend.
Note that this algebra is different from the builtin truth in matching logic.
You can, however, convert from the truth of the Bool sort to the truth in
matching logic via the expression {B #Equals true}.

The boolean values are true and false.

module SORT-BOOL
  syntax Bool [hook(BOOL.Bool)]
endmodule

module BOOL-SYNTAX
  imports SORT-BOOL
  syntax Bool ::= "true"  [token]
  syntax Bool ::= "false" [token]
endmodule

module BOOL-COMMON
  imports private BASIC-K
  imports BOOL-SYNTAX

Basic boolean arithmetic

You can:

  • Negate a boolean value.
  • AND two boolean values.
  • XOR two boolean values.
  • OR two boolean values.
  • IMPLIES two boolean values (i.e., P impliesBool Q is the same as
    notBool P orBool Q)
  • Check equality of two boolean values.
  • Check inequality of two boolean values.

Note that only andThenBool and orElseBool are short-circuiting. andBool
and orBool may be short-circuited in concrete backends, but in symbolic
backends, both arguments will be evaluated.

  syntax Bool ::= "notBool" Bool          [function, total, symbol(notBool_), smt-hook(not), group(boolOperation), hook(BOOL.not)]
                > Bool "andBool" Bool     [function, total, symbol(_andBool_), left, smt-hook(and), group(boolOperation), hook(BOOL.and)]
                | Bool "andThenBool" Bool [function, total, symbol(_andThenBool_), left, smt-hook(and), group(boolOperation), hook(BOOL.andThen)]
                | Bool "xorBool" Bool     [function, total, symbol(_xorBool_), left, smt-hook(xor), group(boolOperation), hook(BOOL.xor)]
                | Bool "orBool" Bool      [function, total, symbol(_orBool_), left, smt-hook(or), group(boolOperation), hook(BOOL.or)]
                | Bool "orElseBool" Bool  [function, total, symbol(_orElseBool_), left, smt-hook(or), group(boolOperation), hook(BOOL.orElse)]
                | Bool "impliesBool" Bool [function, total, symbol(_impliesBool_), left, smt-hook(=>), group(boolOperation), hook(BOOL.implies)]
                > left:
                  Bool "==Bool" Bool      [function, total, symbol(_==Bool_), left, comm, smt-hook(=), hook(BOOL.eq)]
                | Bool "=/=Bool" Bool     [function, total, symbol(_=/=Bool_), left, comm, smt-hook(distinct), hook(BOOL.ne)]

Implementation of Booleans

The remainder of this section consists of an implementation in K of the
operations listed above.

  rule notBool true => false
  rule notBool false => true

  rule true andBool B:Bool => B:Bool
  rule B:Bool andBool true => B:Bool [simplification]
  rule false andBool _:Bool => false
  rule _:Bool andBool false => false [simplification]

  rule true andThenBool K::Bool => K
  rule K::Bool andThenBool true => K [simplification]
  rule false andThenBool _ => false
  rule _ andThenBool false => false  [simplification]

  rule false xorBool B:Bool => B:Bool
  rule B:Bool xorBool false => B:Bool [simplification]
  rule B:Bool xorBool B:Bool => false

  rule true orBool _:Bool => true
  rule _:Bool orBool true => true [simplification]
  rule false orBool B:Bool => B
  rule B:Bool orBool false => B   [simplification]

  rule true orElseBool _ => true
  rule _ orElseBool true => true     [simplification]
  rule false orElseBool K::Bool => K
  rule K::Bool orElseBool false => K [simplification]

  rule true impliesBool B:Bool => B
  rule false impliesBool _:Bool => true
  rule _:Bool impliesBool true => true       [simplification]
  rule B:Bool impliesBool false => notBool B [simplification]

  rule B1:Bool =/=Bool B2:Bool => notBool (B1 ==Bool B2)
endmodule

module BOOL-KORE [symbolic]
  imports BOOL-COMMON

  rule {true #Equals notBool @B} => {false #Equals @B} [simplification]
  rule {notBool @B #Equals true} => {@B #Equals false} [simplification]
  rule {false #Equals notBool @B} => {true #Equals @B} [simplification]
  rule {notBool @B #Equals false} => {@B #Equals true} [simplification]

  rule {true #Equals @B1 andBool @B2} => {true #Equals @B1} #And {true #Equals @B2} [simplification]
  rule {@B1 andBool @B2 #Equals true} => {@B1 #Equals true} #And {@B2 #Equals true} [simplification]
  rule {false #Equals @B1 orBool @B2} => {false #Equals @B1} #And {false #Equals @B2} [simplification]
  rule {@B1 orBool @B2 #Equals false} => {@B1 #Equals false} #And {@B2 #Equals false} [simplification]
endmodule

module BOOL
  imports BOOL-COMMON
  imports BOOL-KORE
endmodule

Integers

Provided here is the syntax of an implementation of arbitrary-precision
integer arithmetic in K. This type is hooked to an implementation of integers
provided by the backend. For a fixed-width integer type, see the MINT module
below.

The UNSIGNED-INT-SYNTAX module provides a syntax of whole numbers in K.
This is useful because often programming languages implement the sign of an
integer as a unary operator rather than part of the lexical syntax of integers.
However, you can also directly reference integers with a sign using the
INT-SYNTAX module.

module UNSIGNED-INT-SYNTAX
  syntax Int [hook(INT.Int)]
  syntax Int ::= r"[0-9]+" [prefer, token, prec(2)]
endmodule

module INT-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  syntax Int ::= r"[\\+\\-]?[0-9]+" [prefer, token, prec(2)]
endmodule

module INT-COMMON
  imports INT-SYNTAX
  imports private BOOL

Integer arithmetic

You can:

  • Compute the bitwise complement ~Int of an integer value in twos-complement.
  • Compute the exponentiation ^Int of two integers.
  • Compute the exponentiation of two integers modulo another integer (^%Int).
    A ^%Int B C is equal in value to (A ^Int B) %Int C, but has a better
    asymptotic complexity.
  • Compute the product *Int of two integers.
  • Compute the quotient /Int or modulus %Int of two integers using
    t-division, which rounds towards zero. Division by zero is #False.
  • Compute the quotient divInt or modulus modInt of two integers using
    Euclidean division, in which the remainder is always non-negative. Division
    by zero is #False.
  • Compute the sum +Int or difference -Int of two integers.
  • Compute the arithmetic right shift >>Int of two integers. Shifting by a
    negative quantity is #False.
  • Compute the left shift of two integers. Shifting by a negative quantity is
    #False.
  • Compute the bitwise and of two integers in twos-complement.
  • Compute the bitwise xor of two integers in twos-complement.
  • Compute the bitwise inclusive-or of two integers in twos-complement.
  syntax Int ::= "~Int" Int                     [function, symbol(~Int_), total, hook(INT.not), smtlib(notInt)]
               > left:
                 Int "^Int" Int                 [function, symbol(_^Int_), left, smt-hook(^), hook(INT.pow)]
               | Int "^%Int" Int Int            [function, symbol(_^%Int__), left, smt-hook((mod (^ #1 #2) #3)), hook(INT.powmod)]
               > left:
                 Int "*Int" Int                 [function, total, symbol(_*Int_), left, comm, smt-hook(*), hook(INT.mul)]
               /* FIXME: translate /Int and %Int into smtlib */
               /* /Int and %Int implement t-division, which rounds towards 0. SMT hooks need to convert from Euclidian division operations */
               | Int "/Int" Int                 [function, symbol(_/Int_), left,
                                                 smt-hook((ite (or (= 0 (mod #1 #2)) (>= #1 0)) (div #1 #2) (ite (> #2 0) (+ (div #1 #2) 1) (- (div #1 #2) 1)))),
                                                 hook(INT.tdiv)]
               | Int "%Int" Int                 [function, symbol(_%Int_), left,
                                                 smt-hook((ite (or (= 0 (mod #1 #2)) (>= #1 0)) (mod #1 #2) (ite (> #2 0) (- (mod #1 #2) #2) (+ (mod #1 #2) #2)))),
                                                 hook(INT.tmod)]
               /* divInt and modInt implement e-division according to the Euclidean division theorem, therefore the remainder is always positive */
               | Int "divInt" Int               [function, symbol(_divInt_), left, smt-hook(div), hook(INT.ediv)]
               | Int "modInt" Int               [function, symbol(_modInt_), left, smt-hook(mod), hook(INT.emod)]
               > left:
                 Int "+Int" Int                 [function, total, symbol(_+Int_), left, comm, smt-hook(+), hook(INT.add)]
               | Int "-Int" Int                 [function, total, symbol(_-Int_), left, smt-hook(-), hook(INT.sub)]
               > left:
                 Int ">>Int" Int                [function, symbol(_>>Int_), left, hook(INT.shr), smtlib(shrInt)]
               | Int "<<Int" Int                [function, symbol(_<<Int_), left, hook(INT.shl), smtlib(shlInt)]
               > left:
                 Int "&Int" Int                 [function, total, symbol(_&Int_), left, comm, hook(INT.and), smtlib(andInt)]
               > left:
                 Int "xorInt" Int               [function, total, symbol(_xorInt_), left, comm, hook(INT.xor), smtlib(xorInt)]
               > left:
                 Int "|Int" Int                 [function, total, symbol(_|Int_), left, comm, hook(INT.or), smtlib(orInt)]

Integer minimum and maximum

You can compute the minimum and maximum minInt and maxInt of two integers.

  syntax Int ::= "minInt" "(" Int "," Int ")"   [function, total, smt-hook((ite (< #1 #2) #1 #2)), hook(INT.min)]
               | "maxInt" "(" Int "," Int ")"   [function, total, smt-hook((ite (< #1 #2) #2 #1)), hook(INT.max)]

Absolute value

You can compute the absolute value absInt of an integer.

  syntax Int ::= absInt ( Int )                 [function, total, smt-hook((ite (< #1 0) (- 0 #1) #1)), hook(INT.abs)]

Log base 2

You can compute the log base 2, rounded towards zero, of an integer. The log
base 2 of an integer is equal to the index of the highest bit set in the
representation of a positive integer. Log base 2 of zero or a negative number
is #False.