K

Table of Contents

K is a rewrite-based
executable semantic framework in which programming languages, type
systems and formal analysis tools can be defined using configurations
and rules. Configurations organize the state in units called cells,
which are labeled and can be nested. K rewrite rules make it explicit
which parts of the term are read-only, write-only, read-write, or
unused. This makes K suitable for defining truly concurrent languages
even in the presence of sharing. Computations are represented as
syntactic extensions of the original language abstract syntax, using a
nested list structure which sequentializes computational tasks, such
as program fragments. Computations are like any other terms in a
rewriting environment: they can be matched, moved from one place to
another, modified, or deleted. This makes K suitable for defining
control-intensive features such as abrupt termination, exceptions, or
call/cc.

Overview

K Tool Download

  • The provided K Tool Binaries are supported on Linux, OS X, and Windows. Other platforms may or may not work correctly. We welcome information about usability of unsupported platforms or bugs in the supported platforms.
  • Try our Editor Support page for links to K syntax highlighting definitions for various popular editors/IDEs. Please feel free to contribute.
  • The source code (Java) is available on GitHub, where you can also report bugs (please do so).

Learn K

  • K webpage at UAIC (Romania).
  • Matching logic webpage at UIUC (USA).
  • Online K Discussion Channel for K users (Slack & Riot). This is the recommended way to ask questions about K and interact with the K community.
  • Stackoverflow for general questions to the K user community (use the channel above if you want quick answers).

K Tool Binaries

Download the latest stable release or prior releases.

K Tutorial

The purpose of this series of lessons is to teach developers how to program in
K. While the primary use of K is in the specification of operational semantics
of programming languages, this tutorial is agnostic on how the knowledge of K
is used. For a more detailed tutorial explaining the basic principles of
programming language design, refer to the
K PL Tutorial. Note that that tutorial is somewhat
out of date presently.

This K tutorial is a work in progress. Many lessons are currently simply
placeholders for future content.

To start the K tutorial, begin with
Section 1: Basic Programming in K.

Section 1: Basic K Concepts

The goal of this first section of the K tutorial is to teach the basic
principles of K to someone with no prior experience with K as a programming
language. However, this is not written with the intended audience of someone
who is a complete beginner to programming. We are assuming that the reader
has a firm grounding in computer science broadly, as well as that they have
experience writing code in functional programming languages before.

By the end of this section, the reader ought to be able to write specifications
of simple languages in K, use these specifications to generate a fast
interpreter for their programming language, as well as write basic deductive
program verification proofs over programs in their language. This should give
them the theoretical grounding they need to begin expanding their knowledge
of K in Section 2: Intermediate K Concepts.

To begin this section, refer to
Lesson 1.1: Setting up a K Environment.

Lesson 1.1: Setting up a K Environment

The first step to learning K is to install K on your system, and configure your
editor for K development.

Installing K

You have two options for how to install K, depending on how you intend to
interact with the K codebase. If you are solely a user of K, and have no
interest in developing or making changes to K, you most likely will want to
install one of our binary releases of K. However, if you are going to be a K
developer, or simply want to build K from source, you should follow the
instructions for a source build of K.

Installing K from a binary release

K is developed as a rolling release, with each change to K that passes our
CI infrastructure being deployed on GitHub for download. The latest release of
K can be downloaded here.
This page also contains information on how to install K. It is recommended
that you fully uninstall the old version of K prior to installing the new one,
as K does not maintain entries in package manager databases, with the exception
of Homebrew on MacOS.

Installing K from source

You can clone K from GitHub with the following Git command:

git clone https://github.com/runtimeverification/k --recursive

Instructions on how to build K from source can be found
here.

Configuring your editor

K maintains a set of scripts for a variety of text editors, including vim and
emacs, in various states of maintenance. You can download these scripts with
the following Git command:

git clone https://github.com/kframework/k-editor-support

Because K allows users to define their own grammars for parsing K itself,
not all features of K can be effectively highlighted. However, at the cost of
occasionally highlighting things incorrectly, you can get some pretty good
results in many cases. With that being said, some of the editor scripts in the
above repository are pretty out of date. If you manage to improve them, we
welcome pull requests into the repository.

Troubleshooting

If you have problems installing K, we encourage you to reach out to us. If you
follow the above install instructions and run into a problem, you can
Create a bug report on GitHub

Next lesson

Once you have set up K on your system to your satisfaction, you can continue to
Lesson 1.2: Basics of Functional K.

Lesson 1.2: Basics of Functional K

The purpose of this lesson is to explain the basics of productions and
rules in K. These are two types of K sentences. A K file consists of
one or more requires or modules in K. Each module consists of one or
more imports or sentences. For more information on requires, modules, and
sentences, refer to Lesson 1.4. However, for the time
being, just think of a module as a container for sentences, and don't worry
about requires or imports just yet.

Our first K program

To start with, input the following program into your editor as file
lesson-02-a.k:

module LESSON-02-A

  syntax Color ::= Yellow() | Blue()
  syntax Fruit ::= Banana() | Blueberry()
  syntax Color ::= colorOf(Fruit) [function]

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

Save this file and then run:

kompile lesson-02-a.k

kompile is K's compiler. By default, it takes a program or specification
written in K and compiles it into an interpreter for that input. Right now we
are compiling a single file. A set of K files that are compiled together are
called a K definition. We will cover multiple file K definitions later on.
kompile will output a directory containing everything needed to execute
programs and perform proofs using that definition. In this case, kompile will
(by default) create the directory lesson-02-a-kompiled under the current
directory.

Now, save the following input file in your editor as banana.color in the same
directory as lesson-02-a.k:

colorOf(Banana())

We can now evaluate this K term by running (from the same directory):

krun banana.color

krun will use the interpreter generated by the first call to kompile to
execute this program.

You will get the following output:

<k>
  Yellow ( ) ~> .
</k>

For now, don't worry about the <k>, </k>, or ~> . portions of this
output file.

You can also execute small programs directly by specifying them on the command
line instead of putting them in a file. For example, the same program above
could also have been executed by running the following command:

krun -cPGM='colorOf(Banana())'

Now, let's look at what this definition and program did.

Productions, Constructors, and Functions

The first thing to realize is that this K definition contains 5 productions.
Productions are introduced with the syntax keyword, followed by a sort,
followed by the operator ::= followed by the definition of one or more
productions themselves, separated by the | operator. There are different
types of productions, but for now we only care about constructors and
functions. Each declaration separated by the | operator is individually
a single production, and the | symbol simply groups together productions that
have the same sort. For example, we could equally have written an identical K
definition like so:

module LESSON-02-B

  syntax Color ::= Yellow()
  syntax Color ::= Blue()
  syntax Fruit ::= Banana()
  syntax Fruit ::= Blueberry()
  syntax Color ::= colorOf(Fruit) [function]

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

Or even:

module LESSON-02-C

  syntax Color ::= Yellow()
                 | Blue()
                 | colorOf(Fruit) [function]
  syntax Fruit ::= Banana()
                 | Blueberry()

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()

endmodule

Each of these types of productions named above has the same underlying syntax,
but context and attributes are used to distinguish between the different
types. Tokens, brackets, lists, macros, aliases, and anywhere productions will
be covered in a later lesson, but this lesson does introduce us to constructors
and functions. Yellow(), Blue(), Banana(), and Blueberry() are
constructors. You can think of a constructor like a constructor for an
algebraic data type, if you're familiar with a functional language. The data
type itself is the sort that appears on the left of the ::= operator. Sorts
in K consist of uppercase identifiers.

Constructors can have arguments, but these ones do not. We will cover the
syntax of productions in detail in the next lesson, but for now, you can write
a production with no arguments as an uppercase or lowercase identifier followed
by the () operator.

A function is distinguished from a constructor by the presence of the
function attribute. Attributes appear in a comma separated list between
square brackets after any sentence, including both productions and rules.
Various attributes with built-in meanings exist in K and will be discussed
throughout the tutorial.

Exercise

Use krun to compute the return value of the colorOf function on a
Blueberry().

Rules, Matching, and Variables

Functions in K are given definitions using rules. A rule begins with the rule
keyword and contains at least one rewrite operator. The rewrite operator
is represented by the syntax =>. The rewrite operator is one of the built-in
productions in K, and we will discuss in more detail how it can be used in
future lessons, but for now, you can think of a rule as consisting of a
left-hand side and a right-hand side, separated by the rewrite
operator. On the left-hand side is the name of the function and zero or more
patterns corresponding to the parameters of the function. On the right-hand
side is another pattern. The meaning of the rule is relatively simple, having
defined these components. If the function is called with arguments that
match the patterns on the left-hand side, then the return value of the
function is the pattern on the right-hand side.

For example, in the above example, if the argument of the colorOf function
is Banana(), then the return value of the function is Yellow().

So far we have introduced that a constructor is a type of pattern in K. We
will introduce more complex patterns in later lessons, but there is one other
type of basic pattern: the variable. A variable, syntactically, consists
of an uppercase identifier. However, unlike a constructor, a variable will
match any pattern with one exception: Two variables with the same name
must match the same pattern.

Here is a more complex example (lesson-02-d.k):

module LESSON-02-D

  syntax Container ::= Jar(Fruit)
  syntax Fruit ::= Apple() | Pear()

  syntax Fruit ::= contentsOfJar(Container) [function]

  rule contentsOfJar(Jar(F)) => F

endmodule

Here we see that Jar is a constructor with a single argument. You can write a
production with multiple arguments by putting the sorts of the arguments in a
comma-separated list inside the parentheses.

In this example, F is a variable. It will match either Apple() or Pear().
The return value of the function is created by substituting the matched
values of all of the variables into the variables on the right-hand side of
the rule.

To demonstrate, compile this definition and execute the following program with
krun:

contentsOfJar(Jar(Apple()))

You will see when you run it that the program returns Apple(), because that
is the pattern that was matched by F.

Exercises

  1. Extend the definition in lesson-02-a.k with the addition of blackberries
    and kiwis. For simplicity, blackberries are black and kiwis are green. Then
    compile your definition and test that your additional fruits are correctly
    handled by the colorOf function.
  2. Create a new definition which defines an outfit as a multi-argument
    constructor consisting of a hat, shirt, pants, and shoes. Define a new sort,
    Boolean, with two constructors, true and false. Each of hat, shirt, pants,
    and shoes will have a single argument (a color), either black or
    white. Then define an outfitMatching function that will return true if all
    the pieces of the outfit are the same color. You do not need to define the
    case that returns false. Write some tests that your function behaves the way
    you expect.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.3: BNF Syntax and Parser Generation.

Lesson 1.3: BNF Syntax and Parser Generation

The purpose of this lesson is to explain the full syntax and semantics of
productions in K as well as how productions and other syntactic
sentences can be used to define grammars for use parsing both rules as well
as programs.

K's approach to parsing

K's grammar is divided into two components: the outer syntax of K and the
inner syntax of K. Outer syntax refers to the parsing of requires,
modules, imports, and sentences in a K definition. Inner syntax
refers to the parsing of rules and programs. Unlike the outer syntax of
K, which is predetermined, much of the inner syntax of K is defined by you, the
developer. When rules or programs are parsed, they are parsed within the
context of a module. Rules are parsed in the context of the module in which
they exist, whereas programs are parsed in the context of the
main syntax module of a K definition. The productions and other syntactic
sentences in a module are used to construct the grammar of the module, which
is then used to perform parsing.

Basic BNF productions

To illustrate how this works, we will consider a simple K definition which
defines a relatively basic calculator capable of evaluating Boolean expressions
containing and, or, not, and xor.

Input the following program into your editor as file lesson-03-a.k:

module LESSON-03-A

  syntax Boolean ::= "true" | "false"
                   | "!" Boolean [function]
                   | Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

You will notice that the productions in this file look a little different than
the ones from the previous lesson. In point of fact, K has two different
mechanisms for defining productions. We have previously been focused
exclusively on the first mechanism, where the ::= symbol is followed by an
alphanumeric identifier followed by a comma-separated list of sorts in
parentheses. However, this is merely a special case of a more generic mechanism
for defining the syntax of productions using a variant of
BNF Form.

For example, in the previous lesson, we had the following set of productions:

module LESSON-03-B
  syntax Color ::= Yellow() | Blue()
  syntax Fruit ::= Banana() | Blueberry()
  syntax Color ::= colorOf(Fruit) [function]
endmodule

It turns out that this is equivalent to the following definition which defines
the same grammar, but using BNF notation:

module LESSON-03-C
  syntax Color ::= "Yellow" "(" ")" | "Blue" "(" ")"
  syntax Fruit ::= "Banana" "(" ")" | "Blueberrry" "(" ")"
  syntax Color ::= "colorOf" "(" Fruit ")" [function]
endmodule

In this example, the sorts of the argument to the function are unchanged, but
everything else has been wrapped in double quotation marks. This is because
in BNF notation, we distinguish between two types of production items:
terminals and non-terminals. A terminal represents simply a literal
string of characters that is verbatim part of the syntax of that production.
A non-terminal, conversely, represents a sort name, where the syntax of that
production accepts any valid term of that sort at that position.

This is why, when we wrote the program colorOf(Banana()), krun was able to
execute that program: because it represented a term of sort Color that was
parsed and interpreted by K's interpreter. In other words, krun parses and
interprets terms according to the grammar defined by the developer. It is
automatically converted into an AST of that term, and then the colorOf
function is evaluated using the function rules provided in the definition.

Bringing us back to the file lesson-03-a.k, we can see that this grammar
has given a simple BNF grammar for expressions over Booleans. We have defined
constructors corresponding to the Boolean values true and false, and functions
corresponding to the Boolean operators for and, or, not, and xor. We have also
given a syntax for each of these functions based on their syntax in the C
programming language. As such, we can now write programs in the simple language
we have defined.

Input the following program into your editor as and.bool in the same
directory:

true && false

We cannot interpret this program yet, because we have not given rules defining
the meaning of the && function yet, but we can parse it. To do this, you can
run (from the same directory):

kast --output kore and.bool

kast is K's just-in-time parser. It will generate a grammar from your K
definition on the fly and use it to parse the program passed on the command
line. The --output flag controls how the resulting AST is represented; don't
worry about the possible values yet, just use kore.

You ought to get the following AST printed on standard output, minus the
formatting:

inj{SortBoolean{}, SortKItem{}}(
  Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
    Lbltrue'Unds'LESSON-03-A'Unds'Boolean{}(),
    Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
  )
)

Don't worry about what exactly this means yet, just understand that it
represents the AST of the program that you just parsed. You ought to be able
to recognize the basic shape of it by seeing the words true, false, and
And in there. This is Kore, the intermediate representation of K, and we
will cover it in detail later.

Note that you can also tell kast to print the AST in other formats. For a
more direct representation of the original K, while still maintaining the
structure of an AST, you can say kast --output kast and.bool. This will
yield the following output:

`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(
  `true_LESSON-03-A_Boolean`(.KList),
  `false_LESSON-03-A_Boolean`(.KList)
)

Note how the first output is largely a name-mangled version of the second
output. The one difference is the presence of the inj symbol in the KORE
output. We will talk more about this in later lessons.

Exercise

Parse the expression false || true with --output kast. See if you can
predict approximately what the corresponding output would be with
--output kore, then run the command yourself and compare it to your
prediction.

Ambiguities

Now let's try a slightly more advanced example. Input the following program
into your editor as and-or.bool:

true && false || false

When you try and parse this program, you ought to see the following error:

[Error] Inner Parser: Parsing ambiguity.
1: syntax Boolean ::= Boolean "||" Boolean [function]

`_||__LESSON-03-A_Boolean_Boolean_Boolean`(`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)),`false_LESSON-03-A_Boolean`(.KList))
2: syntax Boolean ::= Boolean "&&" Boolean [function]

`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`_||__LESSON-03-A_Boolean_Boolean_Boolean`(`false_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)))
        Source(./and-or.bool)
        Location(1,1,1,23)

This error is saying that kast was unable to parse this program because it is
ambiguous. K's just-in-time parser is a GLL parser, which means it can handle
the full generality of context-free grammars, including those grammars which
are ambiguous. An ambiguous grammar is one where the same string can be parsed
as multiple distinct ASTs. In this example, it can't decide whether it should
be parsed as (true && false) || false or as true && (false || false). As a
result, it reports the error to the user.

Brackets

Currently there is no way of resolving this ambiguity, making it impossible
to write complex expressions in this language. This is obviously a problem.
The standard solution in most programming languages to this problem is to
use parentheses to indicate the appropriate grouping. K generalizes this notion
into a type of production called a bracket. A bracket production in K
is any production with the bracket attribute. It is required that such a
production only have a single non-terminal, and the sort of the production
must equal the sort of that non-terminal. However, K does not otherwise
impose restrictions on the grammar the user provides for a bracket. With that
being said, the most common type of bracket is one in which a non-terminal
is surrounded by terminals representing some type of bracket such as
(), [], {}, <>, etc. For example, we can define the most common
type of bracket, the type used by the vast majority of programming languages,
quite simply.

Consider the following modified definition, which we will save to
lesson-03-d.k:

module LESSON-03-D

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   | "!" Boolean [function]
                   | Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

In this definition, if the user does not explicitly define parentheses, the
grammar remains ambiguous and K's just-in-time parser will report an error.
However, you are now able to parse more complex programs by means of explicitly
grouping subterms with the bracket we have just defined.

Consider and-or-left.bool:

(true && false) || false

Now consider and-or-right.bool:

true && (false || false)

If you parse these programs with kast, you will once again get a single
unique AST with no error. If you look, you might notice that the bracket itself
does not appear in the AST. In fact, this is a property unique to brackets:
productions with the bracket attribute are not represented in the parsed AST
of a term, and the child of the bracket is folded immediately into the parent
term. This is the reason for the requirement that a bracket production have
a single non-terminal of the same sort as the production itself.

Exercise

Write out what you expect the AST to be arising from parsing these two programs
above with --output kast, then parse them yourself and compare them to the
AST you expected. Confirm for yourself that the bracket production does not
appear in the AST.

Tokens

So far we have seen how we can define the grammar of a language. However,
the grammar is not the only relevant part of parsing a language. Also relevant
is the lexical syntax of the language. Thus far, we have implicitly been using
K's automatic lexer generation to generate a token in the scanner for each
terminal in our grammar. However, sometimes we wish to define more complex
lexical syntax. For example, consider the case of integers in C: an integer
consists of a decimal, octal, or hexadecimal number followed by an optional
suffix indicating the type of the literal.

In theory it would be possible to define this syntax via a grammar, but not
only would it be cumbersome and tedious, you would also then have to deal with
an AST generated for the literal which is not convenient to work with.

Instead of doing this, K allows you to define token productions, where
a production consists of a regular expression followed by the token
attribute, and the resulting AST consists of a typed string containing the
value recognized by the regular expression.

For example, the builtin integers in K are defined using the following
production:

syntax Int ::= r"[\\+-]?[0-9]+" [token]

Here we can see that we have defined that an integer is an optional sign
followed by a nonzero sequence of digits. The r preceding the terminal
indicates that what appears inside the double quotes is a regular expression,
and the token attribute indicates that terms which parse as this production
should be converted into a token by the parser.

It is also possible to define tokens that do not use regular expressions. This
can be useful when you wish to declare particular identifiers for use in your
semantics later. For example:

syntax Id ::= "main" [token]

Here, we declare that main is a token of sort Id. Instead of being parsed
as a symbol, it gets parsed as a token, generating a typed string in the AST.
This is useful in a semantics of C because the parser generally does not treat
the main function in C specially; only the semantics treats it specially.

Of course, languages can have more complex lexical syntax. For example, if we
wish to define the syntax of integers in C, we could use the following
production:

syntax IntConstant ::= r"(([1-9][0-9]*)|(0[0-7]*)|(0[xX][0-9a-fA-F]+))(([uU][lL]?)|([uU]((ll)|(LL)))|([lL][uU]?)|(((ll)|(LL))[uU]?))?" [token]

As you may have noted above, long and complex regular expressions
can be hard to read. They also suffer from the problem that unlike a grammar,
they are not particularly modular.

We can get around this restriction by declaring explicit regular expressions,
giving them a name, and then referring to them in productions.

Consider the following (equivalent) way to define the lexical syntax of
integers in C:

syntax IntConstant ::= r"({DecConstant}|{OctConstant}|{HexConstant})({IntSuffix}?)" [token]
syntax lexical DecConstant = r"{NonzeroDigit}({Digit}*)"
syntax lexical OctConstant = r"0({OctDigit}*)"
syntax lexical HexConstant = r"{HexPrefix}({HexDigit}+)"
syntax lexical HexPrefix = r"0x|0X"
syntax lexical NonzeroDigit = r"[1-9]"
syntax lexical Digit = r"[0-9]"
syntax lexical OctDigit = r"[0-7]"
syntax lexical HexDigit = r"[0-9a-fA-F]"
syntax lexical IntSuffix = r"{UnsignedSuffix}({LongSuffix}?)|{UnsignedSuffix}{LongLongSuffix}|{LongSuffix}({UnsignedSuffix}?)|{LongLongSuffix}({UnsignedSuffix}?)"
syntax lexical UnsignedSuffix = r"[uU]"
syntax lexical LongSuffix = r"[lL]"
syntax lexical LongLongSuffix = r"ll|LL"

As you can see, this is rather more verbose, but it has the benefit of both
being much easier to read and understand, and also increased modularity.
Note that we refer to a named regular expression by putting the name in curly
brackets. Note also that only the first sentence actually declares a new piece
of syntax in the language. When the user writes syntax lexical, they are only
declaring a regular expression. To declare an actual piece of syntax in the
grammar, you still must actually declare an explicit token production.

One final note: K uses Flex to implement
its lexical analysis. As a result, you can refer to the
Flex Manual
for a detailed description of the regular expression syntax supported. Note
that for performance reasons, Flex's regular expressions are actually a regular
language, and thus lack some of the syntactic convenience of modern
"regular expression" libraries. If you need features that are not part of the
syntax of Flex regular expressions, you are encouraged to express them via
a grammar instead.

Ahead-of-time parser generation

So far we have been entirely focused on K's support for just-in-time parsing,
where the parser is generated on the fly prior to being used. This benefits
from being faster to generate the parser, but it suffers in performance if you
have to repeatedly parse strings with the same parser. For this reason, it is
generally encouraged that when parsing programs, you use K's ahead-of-time
parser generation. K makes use of
GNU Bison to generate parsers.

By default, you can enable ahead-of-time parsing via the --gen-bison-parser
flag to kompile. This will make use of Bison's LR(1) parser generator. As
such, if your grammar is not LR(1), it may not parse exactly the same as if
you were to use the just-in-time parser, because Bison will automatically pick
one of the possible branches whenever it encounters a shift-reduce or
reduce-reduce conflict. In this case, you can either modify your grammar to be
LR(1), or you can enable use of Bison's GLR support by instead passing
--gen-glr-bison-parser to kompile. Note that if your grammar is ambiguous,
the ahead-of-time parser will not provide you with particularly readable error
messages at this time.

If you have a K definition named foo.k, and it generates a directory when
you run kompile called foo-kompiled, you can invoke the ahead-of-time
parser you generated by running foo-kompiled/parser_PGM <file> on a file.

Exercises

  1. Compile lesson-03-d.k with ahead-of-time parsing enabled. Then compare
    how long it takes to run kast --output kore and-or-left.bool with how long it
    takes to run lesson-03-d-kompiled/parser_PGM and-or-left.bool. Confirm for
    yourself that both produce the same result, but that the latter is faster.

  2. Define a simple grammar consisting of integers, brackets, addition,
    subtraction, multiplication, division, and unary negation. Integers should be
    in decimal form and lexically without a sign, whereas negative numbers can be
    represented via unary negation. Ensure that you are able to parse some basic
    arithmetic expressions using a generated ahead-of-time parser. Do not worry
    about disambiguating the grammar or about writing rules to implement the
    operations in this definition.

  3. Write a program where the meaning of the arithmetic expression based on
    the grammar you defined above is ambiguous, and then write programs that
    express each individual intended meaning using brackets.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.4: Disambiguating Parses.

Lesson 1.4: Disambiguating Parses

The purpose of this lesson is to teach how to use K's builtin features for
disambiguation to transform an ambiguous grammar into an unambiguous one that
expresses the intended ASTs.

Priority blocks

In practice, very few formal languages outside the domain of natural language
processing are ambiguous. The main reason for this is that parsing unambiguous
languages is asymptotically faster than parsing ambiguous languages.
Programming language designers instead usually use the notions of operator
precedence and associativity to make expression grammars unambiguous. These
mechanisms work by instructing the parser to reject certain ASTs in favor of
others in case of ambiguities; it is often possible to remove all ambiguities
in a grammar with these techniques.

While it is sometimes possible to explicitly rewrite the grammar to remove
these parses, because K's grammar specification and AST generation are
inextricably linked, this is generally discouraged. Instead, we use the
approach of explicitly expressing the relative precedence of different
operators in different situations in order to resolve the ambiguity.

For example, in C, && binds tighter in precedence than ||, meaning that
the expression true && false || false has only one valid AST:
(true && false) || false.

Consider, then, the third iteration on the grammar of this definition
(lesson-04-a.k):

module LESSON-04-A

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > Boolean "&&" Boolean [function]
                   > Boolean "^" Boolean [function]
                   > Boolean "||" Boolean [function]

endmodule

In this example, some of the | symbols separating productions in a single
block have been replaced with >. This serves to describe the
priority groups associated with this block of productions.

In this example, the first priority group consists of the atoms of the
language: true, false, and the bracket operator. In general, a priority
group starts either at the ::= or > operator and extends until either the
next > operator or the end of the production block. Thus, we can see that the
second, third, fourth, and fifth priority groups in this grammar all consist
of a single production.

The meaning of these priority groups becomes apparent when parsing programs:
A symbol with a lesser priority, (i.e., one that binds looser), cannot
appear as the direct child of a symbol with a greater priority (i.e.,
one that binds tighter. In this case, the > operator can be seen as a
greater-than operator describing a transitive partial ordering on the
productions in the production block, expressing their relative priority.

To see this more concretely, let's look again at the program
true && false || false. As noted before, previously this program was
ambiguous because the parser could either choose that && was the child of ||
or vice versa. However, because a symbol with lesser priority (i.e., ||)
cannot appear as the direct child of a symbol with greater priority
(i.e., &&), the parser will reject the parse where || is under the
&& operator. As a result, we are left with the unambiguous parse
(true && false) || false. Similarly, true || false && false parses
unambiguously as true || (false && false). Conversely, if the user explicitly
wants the other parse, they can express this using brackets by explicitly
writing true && (false || false). This still parses successfully because the
|| operator is no longer the direct child of the && operator, but is
instead the direct child of the () operator, and the && operator is an
indirect parent, which is not subject to the priority restriction.

Astute readers, however, will already have noticed what seems to be a
contradiction: we have defined () as also having greater priority than ||.
One would think that this should mean that || cannot appear as a direct
child of (). This is a problem because priority groups are applied to every
possible parse separately. That is to say, even if the term is unambiguous
prior to this disambiguation rule, we still reject that parse if it violates
the rule of priority.

In fact, however, we do not reject this program as a parse error. Why is that?
Well, the rule for priority is slightly more complex than previously described.
In actual fact, it applies only conditionally. Specifically, it applies in
cases where the child is either the first or last production item in the
parent's production. For example, in the production Bool "&&" Bool, the
first Bool non-terminal is not preceded by any terminals, and the last Bool
non-terminal is not followed by any terminals. As a result of this, we apply
the priority rule to both children of &&. However, in the () operator,
the sole non-terminal is both preceded by and followed by terminals. As a
result, the priority rule is not applied when () is the parent. Because of
this, the program we mentioned above successfully parses.

Exercise

Parse the program true && false || false using kast, and confirm that the AST
places || as the top level symbol. Then modify the definition so that you
will get the alternative parse.

Associativity

Even having broken the expression grammar into priority blocks, the resulting
grammar is still ambiguous. We can see this if we try to parse the following
program (assoc.bool):

true && false && false

Priority blocks will not help us here: the problem comes between two parses
where both possible parses have a direct parent and child which is within a
single priority block (in this case, && is in the same block as itself).

This is where the notion of associativity comes into play. Associativity
applies the following additional rules to parses:

  • a left-associative symbol cannot appear as a direct rightmost child of a
    symbol with equal priority;
  • a right-associative symbol cannot appear as a direct leftmost child of a
    symbol with equal priority; and
  • a non-associative symbol cannot appear as a direct leftmost or rightmost
    child of a symbol with equal priority.

In C, binary operators are all left-associative, meaning that the expression
true && false && false parses unambiguously as (true && false) && false,
because && cannot appear as the rightmost child of itself.

Consider, then, the fourth iteration on the grammar of this definition
(lesson-04-b.k):

module LESSON-04-B

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > left: Boolean "&&" Boolean [function]
                   > left: Boolean "^" Boolean [function]
                   > left: Boolean "||" Boolean [function]

endmodule

Here each priority group, immediately after the ::= or > operator, can
be followed by a symbol representing the associativity of that priority group:
either left: for left associativity, right: for right associativity, or
non-assoc: for non-associativity. In this example, each priority group we
apply associativity to has only a single production, but we could equally well
write a priority block with multiple productions and an associativity.

For example, consider the following, different grammar (lesson-04-c.k):

module LESSON-04-C

  syntax Boolean ::= "true" | "false"
                   | "(" Boolean ")" [bracket]
                   > "!" Boolean [function]
                   > left: 
                     Boolean "&&" Boolean [function]
                   | Boolean "^" Boolean [function]
                   | Boolean "||" Boolean [function]

endmodule

In this example, unlike the one above, &&, ^, and || have the same
priority. However, viewed as a group, the entire group is left associative.
This means that none of &&, ^, and || can appear as the right child of
any of &&, ^, or ||. As a result of this, this grammar is also not
ambiguous. However, it expresses a different grammar, and you are encouraged
to think about what the differences are in practice.

Exercise

Parse the program true && false && false yourself, and confirm that the AST
places the rightmost && at the top of the expression. Then modify the
definition to generate the alternative parse.

Explicit priority and associativity declarations

Previously we have only considered the case where all of the productions
which you wish to express a priority or associativity relation over are
co-located in the same block of productions. However, in practice this is not
always feasible or desirable, especially as a definition grows in size across
multiple modules.

As a result of this, K provides a second way of declaring priority and
associativity relations.

Consider the following grammar, which we will name lesson-04-d.k and which
will express the exact same grammar as lesson-04-b.k

module LESSON-04-D

  syntax Boolean ::= "true" [literal] | "false" [literal]
                   | "(" Boolean ")" [atom, bracket]
                   | "!" Boolean [not, function]
                   | Boolean "&&" Boolean [and, function]
                   | Boolean "^" Boolean [xor, function]
                   | Boolean "|" Boolean [or, function]

  syntax priorities literal atom > not > and > xor > or
  syntax left and
  syntax left xor
  syntax left or
endmodule

This introduces a couple of new features of K. First of all, we see a bunch of
attributes we don't already recognize. These are actually not built-in
attributes, but rather user-defined attributes that are used to group
productions together conceptually. For example, literal in the
syntax priorities sentence is used to refer to the productions with the
literal attribute, i.e., true and false.

Once we understand this, it becomes relatively straightforward to understand
the meaning of this grammar. Each syntax priorities sentence defines a
priority relation where each > separates a priority group containing all
the productions with at least one of the attributes in that group, and each
syntax left, syntax right, or syntax non-assoc sentence defines an
associativity relation connecting all the productions with one of the target
attributes together into a left-, right-, or non-associative grouping.
Specifically, this means that:

syntax left a b

is different to:

syntax left a
syntax left b

As a consequence of this, syntax [left|right|non-assoc] should not be used to
group together labels with different priority.

Prefer/avoid

Sometimes priority and associativity prove insufficient to disambiguate a
grammar. In particular, sometimes it is desirable to be able to choose between
two ambiguous parses directly while still not rejecting any parses if the term
parsed is unambiguous. A good example of this is the famous "dangling else"
problem in imperative C-like languages.

Consider the following definition (lesson-04-E.k):

module LESSON-04-E

  syntax Exp ::= "true" | "false"
  syntax Stmt ::= "if" "(" Exp ")" Stmt
                | "if" "(" Exp ")" Stmt "else" Stmt
                | "{" "}"
endmodule

We can write the following program (dangling-else.if):

if (true) if (false) {} else {}

This is ambiguous because it is unclear whether the else clause is part of
the outer if or the inner if. At first we might try to resolve this with
priorities, saying that the if without an else cannot appear as a child of
the if with an else. However, because the non-terminal in the parent symbol
is both preceded and followed by a terminal, this will not work.

Instead, we can resolve the ambiguity directly by telling the parser to
"prefer" or "avoid" certain productions when ambiguities arise. For example,
when we parse this program, we see the following ambiguity as an error message:

[Error] Inner Parser: Parsing ambiguity.
1: syntax Stmt ::= "if" "(" Exp ")" Stmt

`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(`true_LESSON-04-E_Exp`(.KList),`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(`false_LESSON-04-E_Exp`(.KList),`;_LESSON-04-E_Stmt`(.KList),`;_LESSON-04-E_Stmt`(.KList)))
2: syntax Stmt ::= "if" "(" Exp ")" Stmt "else" Stmt

`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(`true_LESSON-04-E_Exp`(.KList),`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(`false_LESSON-04-E_Exp`(.KList),`;_LESSON-04-E_Stmt`(.KList)),`;_LESSON-04-E_Stmt`(.KList))
        Source(./dangling-else.if)
        Location(1,1,1,30)

Roughly, we see that the ambiguity is between an if with an else or an if
without an else. Since we want to pick the first parse, we can tell K to
"avoid" the second parse with the avoid attribute. Consider the following
modified definition (lesson-04-f.k):

module LESSON-04-F

  syntax Exp ::= "true" | "false"
  syntax Stmt ::= "if" "(" Exp ")" Stmt
                | "if" "(" Exp ")" Stmt "else" Stmt [avoid]
                | "{" "}"
endmodule

Here we have added the avoid attribute to the else production. As a result,
when an ambiguity occurs and one or more of the possible parses has that symbol
at the top of the ambiguous part of the parse, we remove those parses from
consideration and consider only those remaining. The prefer attribute behaves
similarly, but instead removes all parses which do not have that attribute.
In both cases, no action is taken if the parse is not ambiguous.

Exercises

  1. Parse the program if (true) if (false) {} else {} using lesson-04-f.k
    and confirm that else clause is part of the innermost if statement. Then
    modify the definition so that you will get the alternative parse.

  2. Modify your solution from lesson 1.3, problem 2 so that unary negation should
    bind tighter than multiplication and division, which should bind tighter than
    addition and subtraction, and each binary operator should be left associative.
    Write these priority and associativity declarations both inline and explicitly.

  3. Write a simple grammar containing at least one ambiguity that cannot be
    resolved via priority or associativity, and then use the prefer attribute to
    resolve that ambiguity.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.5: Modules, Imports, and Requires.

Lesson 1.5: Modules, Imports, and Requires

The purpose of this lesson is to explain how K definitions can be broken into
separate modules and files and how these distinct components combine into a
complete K definition.

K's outer syntax

Recall from Lesson 1.3 that K's grammar is broken
into two components: the outer syntax of K and the inner syntax of K.
Outer syntax, as previously mentioned, consists of requires, modules,
imports, and sentences. A K semantics is expressed by the set of
sentences contained in the definition. The scope of what is considered
contained in that definition is determined both by the main semantics
module
of a K definition, as well as the requires and imports present
in the file that contains that module.

Basic module syntax

The basic unit of grouping sentences in K is the module. A module consists
of a module name, an optional list of attributes, a list of
imports, and a list of sentences.

A module name consists of one or more groups of letters, numbers, or
underscores, separated by a hyphen. Here are some valid module names: FOO,
FOO-BAR, foo0, foo0_bar-Baz9. Here are some invalid module names: -,
-FOO, BAR-, FOO--BAR. Stylistically, modules names are usually all
uppercase with hyphens separating words, but this is not strictly enforced.

Some example modules include an empty module:

module LESSON-05-A

endmodule

A module with some attributes:

module LESSON-05-B [attr1, attr2, attr3(value)]

endmodule

A module with some sentences:

module LESSON-05-C
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
  rule not true => false
  rule not false => true
endmodule

Imports

Thus far we have only discussed definitions containing a single module.
Definitions can also contain multiple modules, in which one module imports
others.

An import in K appears at the top of a module, prior to any sentences. It can
be specified with the imports keyword, followed by a module name.

For example, here is a simple definition with two modules (lesson-05-d.k):

module LESSON-05-D-1
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
endmodule

module LESSON-05-D
  imports LESSON-05-D-1

  rule not true => false
  rule not false => true
endmodule

This K definition is equivalent to the definition expressed by the single module
LESSON-05-C. Essentially, by importing a module, we include all of the
sentences in the module being imported into the module that we import from.
There are a few minor differences between importing a module and simply
including its sentences in another module directly, but we will cover these
differences later. Essentially, you can think of modules as a way of
conceptually grouping sentences in a larger K definition.

Exercise

Modify lesson-05-d.k to include four modules: one containing the syntax, two
with one rule each that imports the first module, and a final module
LESSON-05-D containing no sentences that imports the second and third module.
Check to make sure the definition still compiles and that you can still evaluate
the not function.

Parsing in the presence of multiple modules

As you may have noticed, each module in a definition can express a distinct set
of syntax. When parsing the sentences in a module, we use the syntax
of that module, enriched with the basic syntax of K, in order to parse
rules in that module. For example, the following definition is a parser error
(lesson-05-e.k):

module LESSON-05-E-1
  rule not true => false
  rule not false => true
endmodule

module LESSON-05-E-2
  syntax Boolean ::= "true" | "false"
  syntax Boolean ::= "not" Boolean [function]
endmodule

This is because the syntax referenced in module LESSON-05-E-1, namely, not,
true, and false, is not imported by that module. You can solve this problem
by simply importing the modules containing the syntax you want to use in your
sentences.

Main syntax and semantics modules

When we are compiling a K definition, we need to know where to start. We
designate two specific entry point modules: the main syntax module
and the main semantics module. The main syntax module, as well as all the
modules it imports recursively, are used to create the parser for programs that
you use to parse programs that you execute with krun. The main semantics
module, as well as all the modules it imports recursively, are used to
determine the rules that can be applied at runtime in order to execute a
program. For example, in the above example, if the main semantics module is
module LESSON-05-D-1, then not is an uninterpreted function (i.e., has no
rules associated with it), and the rules in module LESSON-05-D-2 are not
included.

While you can specify the entry point modules explicitly by passing the
--main-module and --syntax-module flags to kompile, by default, if you
type kompile foo.k, then the main semantics module will be FOO and the
main syntax module will be FOO-SYNTAX.

Splitting a definition into multiple files

So far, while we have discussed ways to break definitions into separate
conceptual components (modules), K also provides a mechanism for combining
multiple files into a single K definition, namely, the requires directive.

In K, the requires keyword has two meanings. The first, the requires
statement, appears at the top of a K file, prior to any module declarations. It
consists of the keyword requires followed by a double-quoted string. The
second meaning of the requires keyword will be covered in a later lesson,
but it is distinguished because the second case occurs only inside modules.

The string passed to the requires statement contains a filename. When you run
kompile on a file, it will look at all of the requires statements in that
file, look up those files on disk, parse them, and then recursively process all
the requires statements in those files. It then combines all the modules in all
of those files together, and uses them collectively as the set of modules to
which imports statements can refer.

Putting it all together

Putting it all together, here is one possible way in which we could break the
definition lesson-02-c.k from Lesson 1.2 into
multiple files and modules:

colors.k:

module COLORS
  syntax Color ::= Yellow()
                 | Blue()
endmodule

fruits.k:

module FRUITS
  syntax Fruit ::= Banana()
                 | Blueberry()
endmodule

colorOf.k:

requires "fruits.k"
requires "colors.k"

module COLOROF-SYNTAX
  imports COLORS
  imports FRUITS

  syntax Color ::= colorOf(Fruit) [function]
endmodule

module COLOROF
  imports COLOROF-SYNTAX

  rule colorOf(Banana()) => Yellow()
  rule colorOf(Blueberry()) => Blue()
endmodule

You would then compile this definition with kompile colorOf.k and use it the
same way as the original, single-module definition.

Exercise

Modify the name of the COLOROF module, and then recompile the definition.
Try to understand why you now get a compiler error. Then, resolve this compiler
error by passing the --main-module and --syntax-module flags to kompile.

Include path

One note can be made about how paths are resolved in requires statements.

By default, the path you specify is allowed to be an absolute or a relative
path. If the path is absolute, that exact file is imported. If the path is
relative, a matching file is looked for within all of the
include directories specified to the compiler. By default, the include
directories include the current working directory, followed by the
include/kframework/builtin directory within your installation of K. You can
also pass one or more directories to kompile via the -I command line flag,
in which case these directories are prepended to the beginning of the list.

Exercises

  1. Take the solution to lesson 1.4, problem 2 which included the explicit
    priority and associativity declarations, and modify the definition so that
    the syntax of integers and brackets is in one module, the syntax of addition,
    subtraction, and unary negation is in another module, and the syntax of
    multiplication and division is in a third module. Make sure you can still parse
    the same set of expressions as before. Place priority declarations in the main
    module.

  2. Modify lesson-02-d.k from lesson 1.2 so that the rules and syntax are in
    separate modules in separate files.

  3. Place the file containing the syntax from problem 2 in another directory,
    then recompile the definition. Observe why a compilation error occurs. Then
    fix the compiler error by passing -I to kompile.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.6: Integers and Booleans.

Lesson 1.6: Integers and Booleans

The purpose of this lesson is to explain the two most basic types of builtin
sorts in K, the Int sort and the Bool sort, representing
arbitrary-precision integers and Boolean algebra.

Builtin sorts in K

K provides definitions of some useful sorts in
domains.md, found in the
include/kframework/builtin directory of the K installation. This file is
defined via a
Literate programming
style that we will discuss in a future lesson. We will not cover all of the
sorts found there immediately, however, this lesson discusses some of the
details surrounding integers and Booleans, as well as providing information
about how to look up more detailed knowledge about builtin functions in K's
documentation.

Booleans in K

The most basic builtin sort K provides is the Bool sort, representing
Boolean values (i.e., true and false). You have already seen how we were
able to create this type ourselves using K's parsing and disambiguation
features. However, in the vast majority of cases, we prefer instead to import
the version of Boolean algebra defined by K itself. Most simply, you can do
this by importing the module BOOL in your definition. For example
(lesson-06-a.k):

module LESSON-06-A
  imports BOOL

  syntax Fruit ::= Blueberry() | Banana()
  syntax Bool ::= isBlue(Fruit) [function]

  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false
endmodule

Here we have defined a simple predicate, i.e., a function returning a
Boolean value. We are now able to perform the usual Boolean operations of
and, or, and not over these values. For example (lesson-06-b.k):"

module LESSON-06-B
  imports BOOL

  syntax Fruit ::= Blueberry() | Banana()
  syntax Bool ::= isBlue(Fruit) [function]

  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false

  syntax Bool ::= isYellow(Fruit) [function]
                | isBlueOrYellow(Fruit) [function]

  rule isYellow(Banana()) => true
  rule isYellow(Blueberry()) => false

  rule isBlueOrYellow(F) => isBlue(F) orBool isYellow(F)
endmodule

In the above example, Boolean inclusive or is performed via the orBool
function, which is defined in the BOOL module. As a matter of convention,
many functions over builtin sorts in K are suffixed with the name of the
primary sort over which those functions are defined. This happens so that the
syntax of K does not (generally) conflict with the syntax of any other
programming language, which would make it harder to define that programming
language in K.

Exercise

Write a function isBlueAndNotYellow which computes the appropriate Boolean
expression. If you are unsure what the appropriate syntax is to use, you
can refer to the BOOL module in
domains.md. Add a term of
sort Fruit for which isBlue and isYellow both return true, and test that
the isBlueAndNotYellow function behaves as expected on all three Fruits.

Syntax Modules

For most sorts in domains.md, K defines more than one module that can be
imported by users. For example, for the Bool sort, K defines the BOOL
module that has previously already been discussed, but also provides the
BOOL-SYNTAX module. This module, unlike the BOOL module, only declares the
values true and false, but not any of the functions that operate over the
Bool sort. The rationale is that you may want to import this module into the
main syntax module of your definition in some cases, whereas you generally do
not want to do this with the version of the module that includes all the
functions over the Bool sort. For example, if you were defining the semantics
of C++, you might import BOOL-SYNTAX into the syntax module of your
definition, because true and false are part of the grammar of C++, but
you would only import the BOOL module into the main semantics module, because
C++ defines its own syntax for and, or, and not that is different from the
syntax defined in the BOOL module.

Here, for example, is how we might redefine our Boolean expression calculator
to use the Bool sort while maintaining an idiomatic structure of modules
and imports, for the first time including the rules to calculate the values of
expressions themselves (lesson-06-c.k):

module LESSON-06-C-SYNTAX
  imports BOOL-SYNTAX

  syntax Bool ::= "(" Bool ")" [bracket]
                > "!" Bool [function]
                > left:
                  Bool "&&" Bool [function]
                | Bool "^" Bool [function]
                | Bool "||" Bool [function]
endmodule

module LESSON-06-C
  imports LESSON-06-C-SYNTAX
  imports BOOL

  rule ! B => notBool B
  rule A && B => A andBool B
  rule A ^ B => A xorBool B
  rule A || B => A orBool B
endmodule

Note the encapsulation of syntax: the LESSON-06-C-SYNTAX module contains
exactly the syntax of our Boolean expressions, and no more, whereas any other
syntax needed to implement those functions is in the LESSON-06-C module
instead.

Exercise

Add an "implies" function to the above Boolean expression calculator, using the
-> symbol to represent implication. You can look up K's builtin "implies"
function in the BOOL module in domains.md.

Integers in K

Unlike most programming languages, where the most basic integer type is a
fixed-precision integer type, the most commonly used integer sort in K is
the Int sort, which represents the mathematical integers, ie,
arbitrary-precision integers.

K provides three main modules for import when using the Int sort. The first,
containing all the syntax of integers as well as all of the functions over
integers, is the INT module. The second, which provides just the syntax
of integer literals themselves, is the INT-SYNTAX module. However, unlike
most builtin sorts in K, K also provides a third module for the Int sort:
the UNSIGNED-INT-SYNTAX module. This module provides only the syntax of
non-negative integers, i.e., natural numbers. The reasons for this involve
lexical ambiguity. Generally speaking, in most programming languages, -1 is
not a literal, but instead a literal to which the unary negation operator is
applied. K thus provides this module to ease in specifying the syntax of such
languages.

For detailed information about the functions available over the Int sort,
refer to domains.md. Note again how we append Int to the end of most of the
integer operations to ensure they do not collide with the syntax of other
programming languages.

Exercises

  1. Extend your solution from lesson 1.4, problem 2 to implement the rules
    that define the behavior of addition, subtraction, multiplication, and
    division. Do not worry about the case when the user tries to divide by zero
    at this time. Use /Int to implement division. Test your new calculator
    implementation by executing the arithmetic expressions you wrote as part of
    lesson 1.3, problem 2. Check to make sure each computes the value you expected.

  2. Combine the Boolean expression calculator from this lesson with your
    solution to problem 1, and then extend the combined calculator with the <,
    <=, >, >=, ==, and != expressions. Write some Boolean expressions
    that combine integer and Boolean operations, and test to ensure that these
    expressions return the expected truth value.

  3. Compute the following expressions using your solution from problem 2:
    7 / 3, 7 / -3, -7 / 3, -7 / -3. Then replace the /Int function in
    your definition with divInt instead, and observe how the value of the above
    expressions changes. Why does this occur?

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.7: Side Conditions and Rule Priority.

Lesson 1.7: Side Conditions and Rule Priority

The purpose of this lesson is to explain how to write conditional rules in K,
and to explain how to control the order in which rules are tried.

Side Conditions

So far, all of the rules we have discussed have been unconditional rules.
If the left-hand side of the rule matches the arguments to the function, the
rule applies. However, there is another type of rule, a conditional rule.
A conditional rule consists of a rule body containing the patterns to
match, and a side condition representing a Boolean expression that must
evaluate to true in order for the rule to apply.

Side conditions in K are introduced via the requires keyword immediately
following the rule body. For example, here is a rule with a side condition
(lesson-07-a.k):

module LESSON-07-A
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A" 
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90
endmodule

In this case, the gradeFromPercentile function takes a single integer
argument. The function evaluates to letter-A if the argument passed is
greater than 90. Note that the side condition is allowed to refer to variables
that appear on the left-hand side of the rule. In the same manner as variables
appearing on the right-hand side, variables that appear in the side condition
evaluate to the value that was matched on the left-hand side. Then the
functions in the side condition are evaluated, which returns a term of sort
Bool. If the term is equal to true, then the rule applies. Bear in mind
that the side condition is only evaluated at all if the patterns on the
left-hand side of the rule match the term being evaluated.

Exercise

Write a rule that evaluates gradeFromPercentile to letter-B if the argument
to the function is in the range [80,90). Test that the function correctly
evaluates various numbers between 80 and 100.

owise Rules

So far, all the rules we have introduced have had the same priority. What
this means is that K does not necessarily enforce an order in which the rules
are tried. We have only discussed functions so far in K, so it is not
immediately clear why this choice was made, given that a function is not
considered well-defined if multiple rules for evaluating it are capable of
evaluating the same arguments to different results. However, in future lessons
we will discuss other types of rules in K, some of which can be
non-deterministic. What this means is that if more than one rule is capable
of matching, then K will explore both possible rules in parallel, and consider
each of their respective results when executing your program. Don't worry too
much about this right now, but just understand that because of the potential
later for nondeterminism, we don't enforce a total ordering on the order in
which rules are attempted to be applied.

However, sometimes this is not practical; It can be very convenient to express
that a particular rule applies if no other rules for that function are
applicable. This can be expressed by adding the owise attribute to a rule.
What this means, in practice, is that this rule has lower priority than other
rules, and will only be tried to be applied after all the other,
higher-priority rules have been tried and they have failed.

For example, in the above exercise, we had to add a side condition containing
two Boolean comparisons to the rule we wrote to handle letter-B grades.
However, in practice this meant that we compare the percentile to 90 twice. We
can more efficiently and more idiomatically write the letter-B case for the
gradeFromPercentile rule using the owise attribute (lesson-07-b.k):

module LESSON-07-B
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A" 
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [owise]
endmodule

This rule is saying, "if all the other rules do not apply, then the grade is a
B if the percentile is greater than or equal to 80." Note here that we use both
a side condition and an owise attribute on the same rule. This is not
required (as we will see later), but it is allowed. What this means is that the
side condition is only tried if the other rules did not apply and the
left-hand side of the rule matched. You can even use more complex matching on
the left-hand side than simply a variable. More generally, you can also have
multiple higher-priority rules, or multiple owise rules. What this means in
practice is that all of the non-owise rules are tried first, in any order,
followed by all the owise rules, in any order.

Exercise

The grades D and F correspond to the percentile ranges [60, 70) and [0, 60)
respectively. Write another implementation of gradeFromPercentile which
handles only these cases, and uses the owise attribute to avoid redundant
Boolean comparisons. Test that various percentiles in the range [0, 70) are
evaluated correctly.

Rule Priority

As it happens, the owise attribute is a specific case of a more general
concept we call rule priority. In essence, each rule is assigned an integer
priority. Rules are tried in increasing order of priority, starting with a
rule with priority zero, and trying each increasing numerical value
successively.

By default, a rule is assigned a priority of 50. If the rule has the owise
attribute, it is instead given the priority 200. You can see why this will
cause owise rules to be tried after regular rules.

However, it is also possible to directly assign a numerical priority to a rule
via the priority attribute. For example, here is an alternative way
we could express the same two rules in the gradeFromPercentile function
(lesson-07-c.k):

module LESSON-07-C
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A" 
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)]
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(200)]
endmodule

We can, of course, assign a priority equal to any non-negative integer. For
example, here is a more complex example that handles the remaining grades
(lesson-07-d.k):

module LESSON-07-D
  imports BOOL
  imports INT

  syntax Grade ::= "letter-A" 
                 | "letter-B"
                 | "letter-C"
                 | "letter-D"
                 | "letter-F"
                 | gradeFromPercentile(Int) [function]

  rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)]
  rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(51)]
  rule gradeFromPercentile(I) => letter-C requires I >=Int 70 [priority(52)]
  rule gradeFromPercentile(I) => letter-D requires I >=Int 60 [priority(53)]
  rule gradeFromPercentile(_) => letter-F                     [priority(54)]
endmodule

Note that we have introduced a new piece of syntax here: _. This is actually
just a variable. However, as a special case, when a variable is named _, it
does not bind a value that can be used on the right-hand side of the rule, or
in a side condition. Effectively, _ is a placeholder variable that means "I
don't care about this term."

In this example, we have explicitly expressed the order in which the rules of
this function are tried. Since rules are tried in increasing numerical
priority, we first try the rule with priority 50, then 51, then 52, 53, and
finally 54.

As a final note, remember that if you assign a rule a priority higher than 200,
it will be tried after a rule with the owise attribute, and if you assign
a rule a priority less than 50, it will be tried before a rule with no
explicit priority.

Exercises

  1. Write a function isEven that returns whether an integer is an even number.
    Use two rules and one side condition. The right-hand side of the rules should
    be Boolean literals. Refer back to
    domains.md for the relevant
    integer operations.

  2. Modify the calculator application from lesson 1.6, problem 2, so that division
    by zero will no longer make krun crash with a "Divison by zero" exception.
    Instead, the / function should not match any of its rules if the denominator
    is zero.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.8: Literate Programming with Markdown.

Lesson 1.8: Literate Programming with Markdown

The purpose of this lesson is to teach a paradigm for performing literate
programming in K, and explain how this can be used to create K definitions
that are also documentation.

Markdown and K

The K tutorial so far has been written in
Markdown. Markdown,
for those not already familiar, is a lightweight plain-text format for styling
text. From this point onward, we assume you are familiar with Markdown and how
to write Markdown code. You can refer to the above link for a tutorial if you
are not already familiar.

What you may not necessarily realize, however, is that the K tutorial is also
a sequence of K definitions written in the manner of
Literate Programming.
For detailed information about Literate Programming, you can read the linked
Wikipedia article, but the short summary is that literate programming is a way
of intertwining documentation and code together in a manner that allows
executable code to also be, simultaneously, a documented description of that
code.

K is provided with built-in support for literate programming using Markdown.
By default, if you pass a file with the .md file extension to kompile, it
will look for any code blocks containing k code in that file, extract out
that K code into pure K, and then compile it as if it were a .k file.

A K code block begins with a line of text containing the keyword ```k,
and ends when it encounters another ``` keyword.

For example, if you view the markdown source of this document, this is a K
code block:

module LESSON-08
  imports INT

Only the code inside K code blocks will actually be sent to the compiler. The
rest, while it may appear in the document when rendered by a markdown viewer,
is essentially a form of code comment.

When you have multiple K code blocks in a document, K will append each one
together into a single file before passing it off to the outer parser.

For example, the following code block contains sentences that are part of the
LESSON-08 module that we declared the beginning of above:

  syntax Int ::= Int "+" Int [function]
  rule I1 + I2 => I1 +Int I2

Exercise

Compile this file with kompile README.md --main-module LESSON-08. Confirm
that you can use the resulting compiled definition to evaluate the +
function.

Markdown Selectors

On occasion, you may want to generate multiple K definitions from a single
Markdown file. You may also wish to include a block of syntax-highlighted K
code that nonetheless does not appear as part of your K definition. It is
possible to accomplish this by means of the built-in support for syntax
highlighting in Markdown. Markdown allows a code block that was begun with
``` to be immediately followed by a string which is used to signify what
programming language the following code is written in. However, this feature
actually allows arbitrary text to appear describing that code block. Markdown
parsers are able to parse this text and render the code block differently
depending on what text appears after the backticks.

In K, you can use this functionality to specify one or more
Markdown selectors which are used to describe the code block. A Markdown
selector consists of a sequence of characters containing letters, numbers, and
underscores. A code block can be designated with a single selector by appending
the selector immediately following the backticks that open the code block.

For example, here is a code block with the foo selector:

foo bar

Note that this is not K code. By convention, K code should have the k
selector on it. You can express multiple selectors on a code block by putting
them between curly braces and prepending each with the . character. For
example, here is a code block with the foo and k selectors:

  syntax Int ::= foo(Int) [function]
  rule foo(0) => 0

Because this code block contains the k Markdown selector, by default it is
included as part of the K definition being compiled.

Exercise

Confirm this fact by using krun to evaluate foo(0).

Markdown Selector Expressions

By default, as previously stated, K includes in the definition any code block
with the k selector. However, this is merely a specific instance of a general
principle, namely, that K allows you to control which selectors get included
in your K definition. This is done by means of the --md-selector flag to
kompile. This flag accepts a Markdown selector expression, which you
can essentially think of as a kind of Boolean algebra over Markdown selectors.
Each selector becomes an atom, and you can combine these atoms via the &,
|, !, and () operators.

Here is a grammar, written in K, of the language of Markdown selector
expressions:

  syntax Selector ::= r"[0-9a-zA-Z_]+" [token]
  syntax SelectorExp ::= Selector
                       | "(" SelectorExp ")" [bracket]
                       > right:
                         "!" SelectorExp
                       > right:
                         SelectorExp "&" SelectorExp
                       > right:
                         SelectorExp "|" SelectorExp

Here is a selector expression that selects all the K code blocks in this
definition except the one immediately above:

k & (! selector)

Addendum

This code block exists in order to make the above lesson a syntactically valid
K definition. Consider why it is necessary.

endmodule

Exercises

  1. Compile this lesson with the selector expression k & (! foo) and confirm
    that you get a parser error if you try to evaluate the foo function with the
    resulting definition.

  2. Compile Lesson 1.3 as a K definition. Identify
    why it fails to compile. Then pass an appropriate --md-selector to the
    compiler in order to make it compile.

  3. Modify your calculator application from lesson 1.7, problem 2, to be written
    in a literate style. Consider what text might be appropriate to turn the
    resulting markdown file into documentation for your calculator.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.9: Unparsing and the format and color attributes.

Lesson 1.9: Unparsing and the format and color attributes

The purpose of this lesson is to teach the user about how terms are
pretty-printed in K, and how the user can make adjustments to the default
settings for how to print specific terms.

Parsing, Execution, and Unparsing

When you use krun to interpret a program, the tool passes through three major
phases. In the first, parsing, the program itself is parsed using either kast
or an ahead-of-time parser generated via Bison, and the resulting AST becomes
the input to the interpreter. In the second phase, execution, K evaluates
functions and (as we will discuss in depth later) performs rewrite steps to
iteratively transform the program state. The third and final phase is called
unparsing, because it consists of taking the final state of the application
after the program has been interpreted, and converting it from an AST back into
text that (in theory, anyway) could be parsed back into the same AST that was
the output of the execution phase.

In practice, unparsing is not always precisely reversible. It turns out
(although we are not going to cover exactly why this is here), that
constructing a sound algorithm that takes a grammar and an AST and emits text
that could be parsed via that grammar to the original AST is an
NP-hard problem. As a result, in the interests of avoiding exponential time
algorithms when users rarely care about unparsing being completely sound, we
take certain shortcuts that provide a linear-time algorithm that approximates
a sound solution to the problem while sacrificing the notion that the result
can be parsed into the exact original term in all cases.

This is a lot of theoretical explanation, but at root, the unparsing process
is fairly simple: it takes a K term that is the output of execution and pretty
prints it according to the syntax defined by the user in their K definition.
This is useful because the original AST is not terribly user-readable, and it
is difficult to visualize the entire term or decipher information about the
final state of the program at a quick glance. Of course, in rare cases, the
pretty-printed configuration loses information of relevance, which is why K
allows you to obtain the original AST on request.

As an example of all of this, consider the following K definition
(lesson-09-a.k):

module LESSON-09-A
  imports BOOL

  syntax Exp ::= "(" Exp ")" [bracket]
               | Bool
               > "!" Exp
               > left:
                 Exp "&&" Exp
               | Exp "^" Exp
               | Exp "||" Exp

  syntax Exp ::= id(Exp) [function]
  rule id(E) => E
endmodule

This is similar to the grammar we defined in LESSON-06-C, with the difference
that the Boolean expressions are now constructors of sort Exp and we define a
trivial function over expressions that returns its argument unchanged.

We can now parse a simple program in this definition and use it to unparse some
Boolean expressions. For example (exp.bool):

id(true&&false&&!true^(false||true))

Here is a program that is not particularly legible at first glance, because all
extraneous whitespace has been removed. However, if we run krun exp.bool, we
see that the result of the unparser will pretty-print this expression rather
nicely:

<k>
  true && false && ! true ^ ( false || true ) ~> .
</k>

Notably, not only does K insert whitespace where appropriate, it is also smart
enough to insert parentheses where necessary in order to ensure the correct
parse. For example, without those parentheses, the expression above would parse
equivalent to the following one:

(((true && false) && ! true) ^ false) || true

Indeed, you can confirm this by passing that exact expression to the id
function and evaluating it, then looking at the result of the unparser:

<k>
  true && false && ! true ^ false || true ~> .
</k>

Here, because the meaning of the AST is the same both with and without
parentheses, K does not insert any parentheses when unparsing.

Exercise

Modify the grammar of LESSON-09-A above so that the binary operators are
right associative. Try unparsing exp.bool again, and note how the result is
different. Explain the reason for the difference.

Custom unparsing of terms

You may have noticed that right now, the unparsing of terms is not terribly
imaginative. All it is doing is taking each child of the term, inserting it
into the non-terminal positions of the production, then printing the production
with a space between each terminal or non-terminal. It is easy to see why this
might not be desirable in some cases. Consider the following K definition
(lesson-09-b.k):

module LESSON-09-B
  imports BOOL

  syntax Stmt ::= "{" Stmt "}" | "{" "}"
                > right:
                  Stmt Stmt
                | "if" "(" Bool ")" Stmt
                | "if" "(" Bool ")" Stmt "else" Stmt [avoid]
endmodule

This is a statement grammar, simplified to the point of meaninglessness, but
still useful as an object lesson in unparsing. Consider the following program
in this grammar (if.stmt):

if (true) {
  if (true) {}
  if (false) {}
  if (true) {
    if (false) {} else {}
  } else {
    if (false) {}
  }
}

This is how that term would be unparsed if it appeared in the output of krun:

if ( true ) { if ( true ) { } if ( false ) { } if ( true ) { if ( false ) { } else { } } else { if ( false ) { } } }

This is clearly much less legible than we started with! What are we to do?
Well, K provides an attribute, format, that can be applied to any production,
which controls how that production gets unparsed. You've seen how it gets
unparsed by default, but via this attribute, the developer has complete control
over how the term is printed. Of course, the user can trivially create ways to
print terms that would not parse back into the same term. Sometimes this is
even desirable. But in most cases, what you are interested in is controlling
the line breaking, indentation, and spacing of the production.

Here is an example of how you might choose to apply the format attribute
to improve how the above term is unparsed (lesson-09-c.k):

module LESSON-09-C
  imports BOOL

  syntax Stmt ::= "{" Stmt "}" [format(%1%i%n%2%d%n%3)] | "{" "}" [format(%1%2)]
                > right:
                  Stmt Stmt [format(%1%n%2)]
                | "if" "(" Bool ")" Stmt [format(%1 %2%3%4 %5)]
                | "if" "(" Bool ")" Stmt "else" Stmt [avoid, format(%1 %2%3%4 %5 %6 %7)]
endmodule

If we compile this new definition and unparse the same term, this is the
result we get:

if (true) {
  if (true) {}
  if (false) {}
  if (true) {
    if (false) {} else {}
  } else {
    if (false) {}
  }
}

This is the exact same text we started with! By adding the format attributes,
we were able to indent the body of code blocks, adjust the spacing of if
statements, and put each statement on a new line.

How exactly was this achieved? Well, each time the unparser reaches a term,
it looks at the format attribute of that term. That format attribute is a
mix of characters and format codes. Format codes begin with the %
character. Each character in the format attribute other than a format code is
appended verbatim to the output, and each format code is handled according to
its meaning, transformed (possibly recursively) into a string of text, and
spliced into the output at the position the format code appears in the format
string.

Provided for reference is a table with a complete list of all valid format
codes, followed by their meaning:

Format Code Meaning
n Insert '\n' followed by the current indentation level
i Increase the current indentation level by 1
d Decrease the current indentation level by 1
c Move to the next color in the list of colors for this production (see next section)
r Reset color to the default foreground color for the terminal (see next section)
an integer Print a terminal or non-terminal from the production. The integer is treated as a 1-based index into the terminals and non-terminals of the production.

If the offset refers to a terminal, move to the next color in the list of colors for this production, print the value of that terminal, then reset the color to the default foreground color for the terminal.

If the offset refers to a regular expression terminal, it is an error.

If the offset refers to a non-terminal, unparse the corresponding child of the current term (starting with the current indentation level) and print the resulting text, then set the current color and indentation level to the color and indentation level following unparsing that term.
other char Print that character verbatim

Exercise

Change the format attributes for LESSON-09-C so that if.stmt will unparse
as follows:

if (true)
{
  if (true)
  {
  }
  if (false)
  {
  }
  if (true)
  {
    if (false)
    {
    }
    else
    {
    }
  }
  else
  {
    if (false)
    {
    }
  }
}

Output coloring

When the output of unparsing is displayed on a terminal supporting colors, K
is capable of coloring the output, similar to what is possible with a syntax
highlighter. This is achieved via the color and colors attributes.

Essentially, both the color and colors attributes are used to construct a
list of colors associated with each production, and then the format attribute
is used to control how those colors are used to unparse the term. At its most
basic level, you can set the color attribute to color all the terminals in
the production a certain color, or you can use the colors attribute to
specify a comma-separated list of colors for each terminal in the production.
At a more advanced level, the %c and %r format codes control how the
formatter interacts with the list of colors specified by the colors
attribute. You can essentially think of the color attribute as a way of
specifying that you want all the colors in the list to be the same color.

Note that the %c and %r format codes are relatively primitive in nature.
The color and colors attributes merely maintain a list of colors, whereas
the %c and %r format codes merely control how to advance through that list
and how individual text is colored.

It is an error if the colors attribute does not provide all the colors needed
by the terminals and escape codes in the production. %r does not change the
position in the list of colors at all, so the next %c will advance to the
following color.

As a complete example, here is a variant of LESSON-09-A which colors the
various boolean operators:

module LESSON-09-D
  imports BOOL

  syntax Exp ::= "(" Exp ")" [bracket]
               | Bool
               > "!" Exp [color(yellow)]
               > left:
                 Exp "&&" Exp [color(red)]
               | Exp "^" Exp [color(blue)]
               | Exp "||" Exp [color(green)]

  syntax Exp ::= id(Exp) [function]
  rule id(E) => E
endmodule

For a complete list of allowed colors, see
here.

Exercises

  1. Use the color attribute on LESSON-09-C to color the keywords true and
    false one color, the keywords if and else another color, and the operators
    (, ), {, and } a third color.

  2. Use the format, color, and colors attributes to tell the unparser to
    style the expression grammar from lesson 1.8, problem 3 according to your own
    personal preferences for syntax highlighting and code formatting. You can
    view the result of the unparser on a function term without evaluating that
    function by means of the command kparse <file> | kore-print -.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.10: Strings.

Lesson 1.10: Strings

The purpose of this lesson is to explain how to use the String sort in K to
represent sequences of characters, and explain where to find additional
information about builtin functions over strings.

The String Sort

In addition to the Int and Bool sorts covered in
Lesson 1.6, K provides, among others, the
String sort to represent sequences of characters. You can import this
functionality via the STRING-SYNTAX module, which contains the syntax of
string literals in K, and the STRING module, which contains all the functions
that operate over the String type.

Strings in K are double-quoted. The following list of escape sequences is
supported:

Escape Sequence Meaning
\" The literal character "
\\ The literal character \
\n The newline character (ASCII code 0x0a)
\r The carriage return character (ASCII code 0x0d)
\t The tab character (ASCII code 0x09)
\f The form feed character (ASCII code 0x0c)
\x00 \x followed by 2 hexadecimal digits indicates a code point between 0x00 and 0xFF
\u0000 \u followed by 4 hexadecimal digits indicates a code point between 0x0000 and 0xFFFF
\U00000000 \U followed by 8 hexadecimal digits indicates a code point between 0x000000 and 0x10FFFF

Please note that as of the current moment, K's unicode support is not fully
complete, so you may run into errors using code points greater than 0xff.

As an example, you can construct a string literal containing the following
block of text:

This is an example block of text.
Here is a quotation: "Hello world."
	This line is indented.
ÁÉÍÓÚ

Like so:

"This is an example block of text.\nHere is a quotation: \"Hello world.\"\n\tThis line is indented.\n\xc1\xc9\xcd\xd3\xda\n"

Basic String Functions

The full list of functions provided for the String sort can be found in
domains.md, but here we
describe a few of the more basic ones.

String concatenation

The concatenation operator for strings is +String. For example, consider
the following K rule that constructs a string from component parts
(lesson-10.k):

module LESSON-10
  imports STRING

  syntax String ::= msg(String) [function]
  rule msg(S) => "The string you provided: " +String S +String "\nHave a nice day!"
endmodule

Note that this operator is O(N), so repeated concatenations are inefficient.
For information about efficient string concatenation, refer to
Lesson 2.14.

String length

The function to return the length of a string is lengthString. For example,
lengthString("foo") will return 3, and lengthString("") will return 0.
The return value is the length of the string in code points.

Substring computation

The function to compute the substring of a string is substrString. It
takes two string indices, starting from 0, and returns the substring within the
range [start..end). It is only defined if end >= start, start >= 0, and
end <= length of string. Here, for example, we return the first 5 characters
of a string:

substrString(S, 0, 5)

Here we return all but the first 3 characters:

substrString(S, 3, lengthString(S))

Exercises

  1. Write a function that takes a paragraph of text (i.e., a sequence of
    sentences, each ending in a period), and constructs a new (nonsense) sentence
    composed of the first word of each sentence, followed by a period. Do not
    worry about capitalization or periods within the sentence which do not end the
    sentence (e.g. "Dr."). You can assume that all whitespace within the paragraph
    are spaces. For more information about the functions over strings required to
    implement such a function, refer to domains.md.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.11: Casting Terms.

Lesson 1.11: Casting Terms

The purpose of this lesson is to explain how to use cast expressions in
order to disambiguate terms using sort information. We also explain how the
variable sort inference algorithm works in K, and how to change the default
behavior by casting variables to a particular sort.

Casting in K

Sometimes the grammar you write for your rules in K can be a little bit
ambiguous on purpose. While grammars for programming languages may be
unambiguous when considered in their entirety, K allows you to write rules
involving arbitrary fragments of that grammar, and those fragments can
sometimes be ambiguous by themselves, or similar enough to other fragments
of the grammar to trigger ambiguity. As a result, in addition to the tools
covered in Lesson 1.4, K provides one
additional powerful tool for disambiguation: cast expressions.

K provides three main types of casts: the semantic cast, the strict cast, and
the projection cast. We will cover each of them, and their similarities and
differences, in turn.

Semantic casts

The most basic, and most common, type of cast in K is called the
semantic cast. For every sort S declared in a module, K provides the
following (implicit) production for use in sentences:

  syntax S ::= S ":S"

Note that S simply represents the name of the sort. For example, if we
defined a sort Exp, the actual production for that sort would be:

  syntax Exp ::= Exp ":Exp"

At runtime, this expression will not actually exist; it is merely an annotation
to the compiler describing the sort of the term inside the cast. It is telling
the compiler that the term inside the cast must be of sort Exp. For example,
if we had the following grammar:

module LESSON-11-A
  imports INT

  syntax Exp ::= Int | Exp "+" Exp
  syntax Stmt ::= "if" "(" Exp ")" Stmt | "{" "}"
endmodule

Then we would be able to write 1:Exp, or (1 + 2):Exp, but not {}:Exp.

You can also restrict the sort that a variable in a rule will match by casting
it. For example, consider the following additional module:

module LESSON-11-B
  imports LESSON-11-A
  imports BOOL

  syntax Term ::= Exp | Stmt
  syntax Bool ::= isExpression(Term) [function]

  rule isExpression(_E:Exp) => true
  rule isExpression(_) => false [owise]
endmodule

Here we have defined a very simple function that decides whether a term is
an expression or a statement. It does this by casting the variable inside the
isExpression rule to sort Exp. As a result, that variable will only match terms
of sort Exp. Thus, isExpression(1) will return true, as will isExpression(1 + 2), but
isExpression({}) will return false.

Exercise

Verify this fact for yourself by running isExpression on the above examples. Then
write an isStatement function, and test that it works as expected.

Strict casts

On occasion, a semantic cast is not strict enough. It might be that you want
to, for disambiguation purposes, say exactly what sort a term is. For
example, consider the following definition:

module LESSON-11-C
  imports INT

  syntax Exp ::= Int | Exp "+" Exp [exp]
  syntax Exp2 ::= Exp | Exp2 "+" Exp2 [exp2]
endmodule

This grammar is a little ambiguous and contrived, but it serves to demonstrate
how a semantic cast might be insufficient to disambiguate a term. If we were
to write the term (I1:Int + I2:Int):Exp2, the term would be ambiguous,
because the cast is not sufficiently strict to determine whether you mean
to derive the "+" production tagged exp, or the one tagged exp2.

In this situation, there is a solution: the strict cast. For every sort
S in your grammar, K also defines the following production:

  syntax S ::= S "::S"

This may at first glance seem the same as the previous cast. And indeed,
from the perspective of the grammar and from the perspective of rewriting,
they are in fact identical. However, the second variant has a unique meaning
in the type system of K: namely, the term inside the cast cannot be a
subsort, i.e., a term of another sort S2 such that the production
syntax S ::= S2 exists.

As a result, if we were to write in the above grammar the term
(I1:Int + I2:Int)::Exp2, then we would know that the second derivation above
should be chosen, whereas if we want the first derivation, we could write
(I1:Int + I2:Int)::Exp.

Projection casts

Thus far we have focused entirely on casts which exist solely to inform the
compiler about the sort of terms. However, sometimes when dealing with grammars
containing subsorts, it can be desirable to reason with the subsort production
itself, which injects one sort into another. Remember from above that such
a production looks like syntax S ::= S2. This type of production, called a
subsort production, can be thought of as a type of inheritance involving
constructors. If we have the above production in our grammar, we say that S2
is a subsort of S, or that any S2 is also an S. K implicitly maintains a
symbol at runtime which keeps track of where such subsortings occur; this
symbol is called an injection.

Sometimes, when one sort is a subsort of another, it can be the case that
a function returns one sort, but you actually want to cast the result of
calling that function to another sort which is a subsort of the first sort.
This is similar to what happens with inheritance in an object-oriented
language, where you might cast a superclass to a subclass if you know for
sure the object at runtime is in fact an instance of that class.

K provides something similar for subsorts: the projection cast.

For each pair of sorts S and S2, K provides the following production:

  syntax S ::= "{" S2 "}" ":>S"

What this means is that you take any term of sort S2 and cast it to sort
S. If the term of sort S2 consists of an injection containing a term of sort
S, then this will return that term. Otherwise, an error occurs and rewriting
fails, returning the projection function which failed to apply. The sort is
not actually checked at compilation time; rather, it is a runtime check
inserted into the code that runs when the rule applies.

For example, here is a module that makes use of projection casts:

module LESSON-11-D
  imports INT
  imports BOOL

  syntax Exp ::= Int | Bool | Exp "+" Exp | Exp "&&" Exp

  syntax Exp ::= eval(Exp) [function]
  rule eval(I:Int) => I
  rule eval(B:Bool) => B
  rule eval(E1 + E2) => {eval(E1)}:>Int +Int {eval(E2)}:>Int
  rule eval(E1 && E2) => {eval(E1)}:>Bool andBool {eval(E2)}:>Bool
endmodule

Here we have defined constructors for a simple expression language over
Booleans and integers, as well as a function eval that evaluates these
expressions to a value. Because that value could be an integer or a Boolean,
we need the casts in the last two rules in order to meet the type signature of
+Int and andBool. Of course, the user can write ill-formed expressions like
1 && true or false + true, but these will cause errors at runtime, because
the projection cast will fail.

Exercises

  1. Extend the eval function in LESSON-11-D to include Strings and add a .
    operator which concatenates them.

  2. Modify your solution from lesson 1.9, problem 2 by using an Exp sort to
    express the integer and Boolean expressions that it supports, in the same style
    as LESSON-11-D. Then write an eval function that evaluates all terms of
    sort Exp to either a Bool or an Int.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.12: Syntactic Lists.

Lesson 1.12: Syntactic Lists

The purpose of this lesson is to explain how K provides support for syntactic
repetition through the use of the List{} and NeList{} constructs,
generally called syntactic lists.

The List{} construct

Sometimes, when defining a grammar in K, it is useful to define a syntactic
construct consisting of an arbitrary-length sequence of items. For example,
you might wish to define a function call construct, and need to express a way
of passing arguments to the function. You can in theory simply define these
productions using ordinary constructors, but it can be tricky to get the syntax
exactly right in K without a lot of tedious glue code.

For this reason, K provides a way of specifying that a non-terminal represents
a syntactic list (lesson-12-a.k):

module LESSON-12-A-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= List{Int,","}
endmodule

module LESSON-12-A
  imports LESSON-12-A-SYNTAX
endmodule

Note that instead of a sequence of terminals and non-terminals, the right hand
side of the Ints production contains the symbol List followed by two items
in curly braces. The first item is the non-terminal which is the element type
of the list, and the second item is a terminal representing the separator of
the list. As a special case, lists which are separated only by whitespace can
be specified with a separator of "".

This List{} construct is roughly equivalent to the following definition
(lesson-12-b.k):

module LESSON-12-B-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= Int "," Ints | ".Ints"
endmodule

module LESSON-12-B
  imports LESSON-12-B-SYNTAX
endmodule

As you can see, the List{} construct represents a cons-list with an element
at the head and another list at the tail. The empty list is represented by
a . followed by the sort of the list.

However, the List{} construct provides several key syntactic conveniences
over the above definition. First of all, when writing a list in a rule,
explicitly writing the terminator is not always required. For example, consider
the following additional module (lesson-12-c.k):

module LESSON-12-C
  imports LESSON-12-A
  imports INT

  syntax Int ::= sum(Ints) [function]
  rule sum(I:Int) => I
  rule sum(I1:Int, I2:Int, Is:Ints) => sum(I1 +Int I2, Is)
endmodule

Here we see a function that sums together a non-empty list of integers. Note in
particular the first rule. We do not explicitly mention .Ints, but in fact,
the rule in question is equivalent to the following rule:

  rule sum(I:Int, .Ints) => I

The reason for this is that K will automatically insert a list terminator
anywhere a syntactic list is expected, but an element of that list appears
instead. This works even with lists of more than one element:

  rule sum(I1:Int, I2:Int) => I1 +Int I2

This rule is redundant, but here we explicitly match a list of exactly two
elements, because the .Ints is implicitly added after I2.

Exercise

Write a function concat which takes a list of String and concatenates them
all together. Do not worry if the function is O(n^2).

Parsing Syntactic Lists in Programs

An additional syntactic convenience takes place when you want to express a
syntactic list in the input to krun. In this case, K will automatically
transform the grammar in LESSON-12-B-SYNTAX into the following
(lesson-12-d.k):

module LESSON-12-D
  imports INT-SYNTAX

  syntax Ints ::= #NonEmptyInts | #IntsTerminator
  syntax #NonEmptyInts ::= Int "," #NonEmptyInts
                         | Int #IntsTerminator
  syntax #IntsTerminator ::= ""
endmodule

This allows you to express the usual comma-separated list of arguments where
an empty list is represented by the empty string, and you don't have to
explicitly terminate the list. Because of this, we can write the syntax
of function calls in C very easily (lesson-12-e.k):

module LESSON-12-E
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id | Exp "(" Exps ")"
  syntax Exps ::= List{Exp,","}
endmodule

Exercise

Write some function call expressions using identifiers in C and verify with
kast that the above grammar captures the intended syntax. Make sure to test
with function calls with zero, one, and two or more arguments.

The NeList{} construct

One limitation of the List{} construct is that it is always possible to
write a list of zero elements where a List{} is expected. While this is
desirable in a number of cases, it is sometimes not what the grammar expects.

For example, in C, it is not allowable for an enum definition to have zero
members. In other words, if we were to write the grammar for enumerations like
so (lesson-12-f.k):

module LESSON-12-F
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id

  syntax EnumSpecifier ::= "enum" Id "{" Ids "}"
  syntax Ids ::= List{Id,","}
endmodule

Then we would be syntactically allowed to write enum X {}, which instead,
ought to be a syntax error.

For this reason, we introduce the additional NeList{} construct. The syntax
is identical to List{}, except with NeList instead of List before the
curly braces. When parsing rules, it behaves identically to the List{}
construct. However, when parsing inputs to krun, the above grammar, if we
replaced syntax Ids ::= List{Id,","} with syntax Ids ::= NeList{Id,","},
would become equivalent to the following (lesson-12-g.k):

module LESSON-12-G
  syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token]
  syntax Exp ::= Id

  syntax EnumSpecifier ::= "enum" Id "{" Ids "}"
  syntax Ids ::= Id | Id "," Ids
endmodule

In other words, only non-empty lists of Id would be allowed.

Exercises

  1. Modify the sum function in LESSON-12-C so that the Ints sort is an
    NeList{}. Verify that calling sum() with no arguments is now a syntax
    error.

  2. Write a modified sum function with the List construct that can also sum
    up an empty list of arguments. In such a case, the sum ought to be 0.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.13: Basics of K Rewriting.

Lesson 1.13: Basics of K Rewriting

The purpose of this lesson is to explain how rewrite rules that are not the
definition of a function behave, and how, using these rules, you can construct
a semantics of programs in a programming language in K.

Recap: Function rules in K

Recall from Lesson 1.2 that we have, thus far,
introduced two types of productions in K: constructors and functions.
A function is identified by the function attribute placed on the
production. As you may recall, when we write a rule with a function on the
left-hand side of the => operator, we are defining the meaning of that
function for inputs which match the patterns on the left-hand side of the rule.
If the argument to the function match the patterns, then the function is
evaluated to the value constructed by substituting the bindings for the
variables into the right-hand side of the rule.

Top-level rules

However, function rules are not the only type of rule permissible in K, nor
even the most frequently used. K also has a concept of a
top-level rewrite rule. The simplest way to ensure that a rule is treated
as a top-level rule is for the left-hand side of the rule to mention one or
more cells. We will cover how cells work and are declared in more detail
in a later lesson, but for now, what you should know is that when we ran krun
in our very first example in Lesson 1.2 and got the following output:

<k>
  Yellow ( ) ~> .
</k>

<k> is a cell, known by convention as the K cell. This cell is available
by default in any definition without needing to be explicitly declared.

The K cell contains a single term of sort K. K is a predefined sort in K
with two constructors, that can be roughly represented by the following
grammar:

  syntax K ::= KItem "~>" K
             | "."

As a syntactic convenience, K allows you to treat ~> like it is an
associative list (i.e., as if it were defined as syntax K ::= K "~>" K), but
when a definition is compiled, it will automatically transform the rules you
write so that they treat the K sort as a cons-list. Another syntactic
convenience is that, for disambiguation purposes, you can write .K anywhere
you would otherwise write . and the meaning is identical.

Now, you may notice that the above grammar mentions the sort KItem. This is
another built-in sort in K. For every sort S declared in a definition (with
the exception of K and KItem), K will implicitly insert the following
production:

  syntax KItem ::= S

In other words, every sort is a subsort of the sort KItem, and thus a term
of any sort can be injected as an element of a term of sort K, also called
a K sequence.

By default, when you krun a program, the AST of the program is inserted as
the sole element of a K sequence into the <k> cell. This explains why we
saw the output we did in Lesson 1.2.

With these preliminaries in mind, we can now explain how top-level rewrite
rules work in K. Put simply, any rule where there is a cell (such as the K
cell) at the top on the left-hand side will be a top-level rewrite rule. Once
the initial program has been inserted into the K cell, the resulting term,
called the configuration, will be matched against all the top-level
rewrite rules in the definition. If only one rule matches, the substitution
generated by the matching will be applied to the right-hand side of the rule
and the resulting term is rewritten to be the new configuration. Rewriting
proceeds by iteratively applying rules, also called taking steps, until
no top-level rewrite rule can be applied. At this point the configuration
becomes the final configuration and is output by krun.

If more than one top-level rule applies, by default, K will pick just one
of those rules, apply it, and continue rewriting. However, it is
non-deterministic which rule applies. In theory, it could be any of them.
By passing the --search flag to krun, you are able to tell krun to
explore all possible non-deterministic choices, and generate a complete list of
all possible final configurations reachable by each nondeterminstic choice that
can be made. Note that the --search flag to krun only works if you pass
--enable-search to kompile first.

Exercise

Pass a program containing no functions to krun. You can use a term of sort
Exp from LESSON-11-D. Observe the output and try to understand why you get
the output you do. Then write two rules that rewrite that program to another.
Run krun --search on that program and observe both results. Then add a third
rule that rewrites one of those results again. Test that that rule applies as
well.

Using top-level rules to evaluate expressions

Thus far, we have focused primarily on defining functions over constructors
in K. However, now that we have a basic understanding of top-level rules,
it is possible to introduce a rewrite system to our definitions. A rewrite
system is a collection of top-level rewrite rules which performs an organized
transformation of a particular program into a result which expresses the
meaning of that program. For example, we might rewrite an expression in a
programming language into a value representing the result of evaluating that
expression.

Recall in Lesson 1.11, we wrote a simple grammar of Boolean and integer
expressions that looked roughly like this (lesson-13-a.k):

module LESSON-13-A
  imports INT

  syntax Exp ::= Int
               | Bool
               | Exp "+" Exp
               | Exp "&&" Exp
endmodule

In that lesson, we defined a function eval which evaluated such expressions
to either an integer or Boolean.

However, it is more idiomatic to evaluate such expressions using top-level
rewrite rules. Here is how one might do so in K (lesson-13-b.k):

module LESSON-13-B-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Val ::= Int | Bool
  syntax Exp ::= Val
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-13-B
  imports LESSON-13-B-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k>
  rule <k> B1:Bool && B2:Bool ~> K:K </k> => <k> B1 andBool B2 ~> K </k>

  syntax KItem ::= freezer1(Val) | freezer2(Exp)
                 | freezer3(Val) | freezer4(Exp)

  rule <k> E1:Val + E2:Exp ~> K:K </k> => <k> E2 ~> freezer1(E1) ~> K </k> [priority(51)]
  rule <k> E1:Exp + E2:Exp ~> K:K </k> => <k> E1 ~> freezer2(E2) ~> K </k> [priority(52)]
  rule <k> E1:Val && E2:Exp ~> K:K </k> => <k> E2 ~> freezer3(E1) ~> K </k> [priority(51)]
  rule <k> E1:Exp && E2:Exp ~> K:K </k> => <k> E1 ~> freezer4(E2) ~> K </k> [priority(52)]

  rule <k> E2:Val ~> freezer1(E1) ~> K:K </k> => <k> E1 + E2 ~> K </k>
  rule <k> E1:Val ~> freezer2(E2) ~> K:K </k> => <k> E1 + E2 ~> K </k>
  rule <k> E2:Val ~> freezer3(E1) ~> K:K </k> => <k> E1 && E2 ~> K </k>
  rule <k> E1:Val ~> freezer4(E2) ~> K:K </k> => <k> E1 && E2 ~> K </k>
endmodule

This is of course rather cumbersome currently, but we will soon introduce
syntactic convenience which makes writing definitions of this type considerably
easier. For now, notice that there are roughly 3 types of rules here: the first
matches a K cell in which the first element of the K sequence is an Exp whose
arguments are values, and rewrites the first element of the sequence to the
result of that expression. The second also matches a K cell with an Exp in
the first element of its K sequence, but it matches when one or both arguments
of the Exp are not values, and replaces the first element of the K sequence
with two new elements: one being an argument to evaluate, and the other being
a special constructor called a freezer. Finally, the third matches a K
sequence where a Val is first, and a freezer is second, and replaces them
with a partially evaluated expression.

This general pattern is what is known as heating an expression,
evaluating its arguments, cooling the arguments into the expression
again, and evaluating the expression itself. By repeatedly performing
this sequence of actions, we can evaluate an entire AST containing a complex
expression down into its resulting value.

Exercise

Write an addition expression with integers. Use krun --depth 1 to see the
result of rewriting after applying a single top-level rule. Gradually increase
the value of --depth to see successive states. Observe how this combination
of rules is eventually able to evaluate the entire expression.

Simplifying the evaluator: Local rewrites and cell ellipses

As you saw above, the definition we wrote is rather cumbersome. Over the
remainder of Lessons 1.13 and 1.14, we will greatly simplify it. The first step
in doing so is to teach a bit more about the rewrite operator, =>. Thus far,
all the rules we have written look like rule LHS => RHS. However, this is not
the only way the rewrite operator can be used. It is actually possible to place
a constructor or function at the very top of the rule, and place rewrite
operators inside that term. While a rewrite operator cannot appear nested
inside another rewrite operator, by doing this, we can express that some parts
of what we are matching are not changed by the rewrite operator. For
example, consider the following rule from above:

  rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k>

We can equivalently write it like following:

  rule <k> (I1:Int + I2:Int => I1 +Int I2) ~> _:K </k>

When you put a rewrite inside a term like this, in essence, you are telling
the rule to only rewrite part of the left-hand side to the right-hand side.
In practice, this is implemented by lifting the rewrite operator to the top of
the rule by means of duplicating the surrounding context.

There is a way that the above rule can be simplified further, however. K
provides a special syntax for each cell containing a term of sort K, indicating
that we want to match only on some prefix of the K sequence. For example, the
above rule can be simplified further like so:

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>

Here we have placed the symbol ... immediately prior to the </k> which ends
the cell. What this tells the compiler is to take the contents of the cell,
treat it as the prefix of a K sequence, and insert an anonymous variable of
sort K at the end. Thus we can think of ... as a way of saying we
don't care about the part of the K sequence after the beginning, leaving
it unchanged.

Putting all this together, we can rewrite LESSON-13-B like so
(lesson-13-c.k):

module LESSON-13-C-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Val ::= Int | Bool
  syntax Exp ::= Val
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-13-C
  imports LESSON-13-C-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax KItem ::= freezer1(Val) | freezer2(Exp)
                 | freezer3(Val) | freezer4(Exp)

  rule <k> E1:Val + E2:Exp => E2 ~> freezer1(E1) ...</k> [priority(51)]
  rule <k> E1:Exp + E2:Exp => E1 ~> freezer2(E2) ...</k> [priority(52)]
  rule <k> E1:Val && E2:Exp => E2 ~> freezer3(E1) ...</k> [priority(51)]
  rule <k> E1:Exp && E2:Exp => E1 ~> freezer4(E2) ...</k> [priority(52)]

  rule <k> E2:Val ~> freezer1(E1) => E1 + E2 ...</k>
  rule <k> E1:Val ~> freezer2(E2) => E1 + E2 ...</k>
  rule <k> E2:Val ~> freezer3(E1) => E1 && E2 ...</k>
  rule <k> E1:Val ~> freezer4(E2) => E1 && E2 ...</k>
endmodule

This is still rather cumbersome, but it is already greatly simplified. In the
next lesson, we will see how additional features of K can be used to specify
heating and cooling rules much more compactly.

Exercises

  1. Modify LESSON-13-C to add rules to evaluate integer subtraction.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.14: Defining Evaluation Order.

Lesson 1.14: Defining Evaluation Order

The purpose of this lesson is to explain how to use the heat and cool
attributes, context and context alias sentences, and the strict and
seqstrict attributes to more compactly express heating and cooling in K,
and to express more advanced evaluation strategies in K.

The heat and cool attributes

Thus far, we have been using rule priority and casts to express when to heat
an expression and when to cool it. For example, the rules for heating have
lower priority, so they do not apply if the term could be evaluated instead,
and the rules for heating are expressly written only to apply if the argument
of the expression is a value.

However, K has built-in support for deciding when to heat and when to cool.
This support comes in the form of the rule attributes heat and cool as
well as the specially named function isKResult.

Consider the following definition, which is equivalent to LESSON-13-C
(lesson-14-a.k):

module LESSON-14-A-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-14-A
  imports LESSON-14-A-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax KItem ::= freezer1(Exp) | freezer2(Exp)
                 | freezer3(Exp) | freezer4(Exp)

  rule <k> E:Exp + HOLE:Exp => HOLE ~> freezer1(E) ...</k>
    requires isKResult(E) [heat]
  rule <k> HOLE:Exp + E:Exp => HOLE ~> freezer2(E) ...</k> [heat]
  rule <k> E:Exp && HOLE:Exp => HOLE ~> freezer3(E) ...</k>
    requires isKResult(E) [heat]
  rule <k> HOLE:Exp && E:Exp => HOLE ~> freezer4(E) ...</k> [heat]

  rule <k> HOLE:Exp ~> freezer1(E) => E + HOLE ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer2(E) => HOLE + E ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer3(E) => E && HOLE ...</k> [cool]
  rule <k> HOLE:Exp ~> freezer4(E) => HOLE && E ...</k> [cool]

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

We have introduced three major changes to this definition. First, we have
removed the Val sort. We replace it instead with a function isKResult.
The function in question must have the same signature and attributes as seen in
this example. It ought to return true whenever a term should not be heated
(because it is a value) and false when it should be heated (because it is not
a value). We thus also insert isKResult calls in the side condition of two
of the heating rules, where the Val sort was previously used.

Second, we have removed the rule priorities on the heating rules and the use of
the Val sort on the cooling rules, and replaced them with the heat and
cool attributes. These attributes instruct the compiler that these rules are
heating and cooling rules, and thus should implicitly apply only when certain
terms on the LHS either are or are not a KResult (i.e., isKResult returns
true versus false).

Third, we have renamed some of the variables in the heating and cooling rules
to the special variable HOLE. Syntactically, HOLE is just a special name
for a variable, but it is treated specially by the compiler. By naming a
variable HOLE, we have informed the compiler which term is being heated
or cooled. The compiler will automatically insert the side condition
requires isKResult(HOLE) to cooling rules and the side condition
requires notBool isKResult(HOLE) to heating rules.

Exercise

Modify LESSON-14-A to add rules to evaluate integer subtraction.

Simplifying further with Contexts

The above example is still rather cumbersome to write. We must explicitly write
both the heating and the cooling rule separately, even though they are
essentially inverses of one another. It would be nice to instead simply
indicate which terms should be heated and cooled, and what part of them to
operate on.

To do this, K introduces a new type of sentence, the context. Contexts
begin with the context keyword instead of the rule keyword, and usually
do not contain a rewrite operator.

Consider the following definition which is equivalent to LESSON-14-A
(lesson-14-b.k):

module LESSON-14-B-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp
               > left: Exp "&&" Exp
endmodule

module LESSON-14-B
  imports LESSON-14-B-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  context <k> E:Exp + HOLE:Exp ...</k>
    requires isKResult(E)
  context <k> HOLE:Exp + _:Exp ...</k>
  context <k> E:Exp && HOLE:Exp ...</k>
    requires isKResult(E)
  context <k> HOLE:Exp && _:Exp ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

In this example, the heat and cool rules have been removed entirely, as
have been the productions defining the freezers. Don't worry, they still exist
under the hood; the compiler is just generating them automatically. For each
context sentence like above, the compiler generates a #freezer production,
a heat rule, and a cool rule. The generated form is equivalent to the
rules we wrote manually in LESSON-14-A. However, we are now starting to
considerably simplify the definition. Instead of 3 sentences, we just have one.

context alias sentences and the strict and seqstrict attributes

Notice that the contexts we included in LESSON-14-B still seem rather
similar in form. For each expression we want to evaluate, we are declaring
one context for each operand of that expression, and they are each rather
similar to one another. We would like to be able to simplify further by
simply annotating each expression production with information about how
it is to be evaluated instead. We can do this with the seqstrict attribute.

Consider the following definition, once again equivalent to those above
(lesson-14-c.k):

module LESSON-14-C-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp [seqstrict(exp; 1, 2)]
               > left: Exp "&&" Exp [seqstrict(exp; 1, 2)]
endmodule

module LESSON-14-C
  imports LESSON-14-C-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  context alias [exp]: <k> HERE ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

This definition has two important changes from the one above. The first is
that the individual context sentences have been removed and have been
replaced with a single context alias sentence. You may notice that this
sentence begins with an identifier in square brackets followed by a colon. This
syntax is a way of naming individual sentences in K for reference by the tool
or by other sentences. The context alias sentence also has a special variable
HERE.

The second is that the productions in LESSON-14-C-SYNTAX have been given a
seqstrict attribute. The value of this attribute has two parts. The first
is the name of a context alias sentence. The second is a comma-separated list
of integers. Each integer represents an index of a non-terminal in the
production, counting from 1. For each integer present, the compiler implicitly
generates a new context sentence according to the following rules:

  1. The compiler starts by looking for the context alias sentence named. If
    there is more than one, then one context sentence is created per
    context alias sentence with that name.
  2. For each context created, the variable HERE in the context alias is
    substituted with an instance of the production the seqstrict attribute is
    attached to. Each child of that production is a variable. The non-terminal
    indicated by the integer offset of the seqstrict attribute is given the name
    HOLE.
  3. For each integer offset prior in the list to the one currently being
    processed, the predicate isKResult(E) is conjuncted together and included
    as a side condition, where E is the child of the production term with that
    offset, starting from 1. For example, if the attribute lists 1, 2, then
    the rule generated for the 2 will include isKResult(E1) where E1 is the
    first child of the production.

As you can see if you work through the process, the above code will ultimately
generate the same contexts present in LESSON-14-B.

Finally, note that there are a few minor syntactic conveniences provided by the
seqstrict attribute. First, in the special case of the context alias sentence
being <k> HERE ...</k>, you can omit both the context alias sentence
and the name from the seqstrict attribute.

Second, if the numbered list of offsets contains every non-terminal in the
production, it can be omitted from the attribute value.

Thus, we can finally produce the idiomatic K definition for this example
(lesson-14-d.k):

module LESSON-14-D-SYNTAX
  imports UNSIGNED-INT-SYNTAX
  imports BOOL-SYNTAX

  syntax Exp ::= Int
               | Bool
               > left: Exp "+" Exp [seqstrict]
               > left: Exp "&&" Exp [seqstrict]
endmodule

module LESSON-14-D
  imports LESSON-14-D-SYNTAX
  imports INT
  imports BOOL

  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
  rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true
  rule isKResult(_) => false [owise]
endmodule

Exercise

Modify LESSON-14-D to add a production and rule to evaluate integer
subtraction.

Nondeterministic evaluation order with the strict attribute

Thus far, we have focused entirely on deterministic evaluation order. However,
not all languages are deterministic in the order they evaluate expressions.
For example, in C, the expression a() + b() + c() is guaranteed to parse
to (a() + b()) + c(), but it is not guaranteed that a will be called before
b before c. In fact, this evaluation order is non-deterministic.

We can express non-deterministic evaluation orders with the strict attribute.
Its behavior is identical to the seqstrict attribute, except that step 3 in
the above list (with the side condition automatically added) does not take
place. In other words, if we wrote syntax Exp ::= Exp "+" Exp [strict]
instead of syntax Exp ::= Exp "+" Exp [seqstrict], it would generate the
following two contexts instead of the ones found in LESSON-14-B:

  context <k> _:Exp + HOLE:Exp ...</k>
  context <k> HOLE:Exp + _:Exp ...</k>

As you can see, these contexts will generate heating rules that can both
apply to the same term. As a result, the choice of which heating rule
applies first is non-deterministic, and as we saw in Lesson 1.13, we can
get all possible behaviors by passing --search to krun.

Exercises

  1. Add integer division to LESSON-14-D. Make division and addition strict
    instead of seqstrict, and write a rule evaluating integer division with a
    side condition that the denominator is non-zero. Run krun --search on the
    program 1 / 0 + 2 / 1 and observe all possible outputs of the program. How
    many are there total, and why?

  2. Modify your solution from lesson 1.11 problem 2 to remove the eval
    function and instead evaluate expressions from left to right using the
    seqstrict attribute.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.15: Configuration Declarations and Cell Nesting.

Lesson 1.15: Configuration Declarations and Cell Nesting

The purpose of this lesson is to explain how to store additional information
about the state of your interpreter by declaring cells using the
configuration sentence, as well as how to add additional inputs to your
definition.

Cells and Configuration Declarations

We have already covered the absolute basics of cells in K by looking at the
<k> cell. As explained in Lesson 1.13, the
<k> cell is available without being explicitly declared. It turns out this is
because, if the user does not explicitly specify a configuration sentence
anywhere in the main module of their definition, the configuration sentence
from the DEFAULT-CONFIGURATION module of
kast.md is imported
automatically. Here is what that sentence looks like:

  configuration <k> $PGM:K </k>

This configuration declaration declares a single cell, the <k> cell. It also
declares that at the start of rewriting, the contents of that cell should be
initialized with the value of the $PGM configuration variable.
Configuration variables function as inputs to krun. These terms are supplied
to krun in the form of ASTs parsed using a particular module. By default, the
$PGM configuration variable uses the main syntax module of the definition.

The cast on the configuration variable also specifies the sort that is used as
the entry point to the parser, in this case the K sort. It is often
useful to cast to other sorts there as well for better control over the accepted
language. The sort used for the $PGM variable is referred to as the start
symbol. During parsing, the default start symbol K subsumes all user-defined
sorts except for syntactic lists. These are excluded because they will always
produce an ambiguity error when parsing a single element.

Note that we did not explicitly specify the $PGM configuration variable when
we invoked krun on a file. This is because krun handles the $PGM variable
specially, and allows you to pass the term for that variable via a file passed
as a positional argument to krun. We did, however, specify the PGM name
explicitly when we called krun with the -cPGM command line argument in
Lesson 1.2. This is the other, explicit, way of
specifying an input to krun.

This explains the most basic use of configuration declarations in K. We can,
however, declare multiple cells and multiple configuration variables. We can
also specify the initial values of cells statically, rather than dynamically
via krun.

For example, consider the following definition (lesson-15-a.k):

module LESSON-15-A-SYNTAX
  imports INT-SYNTAX

  syntax Ints ::= List{Int,","}
endmodule

module LESSON-15-A
  imports LESSON-15-A-SYNTAX
  imports INT

  configuration <k> $PGM:Ints </k>
                <sum> 0 </sum>

  rule <k> I:Int, Is:Ints => Is ...</k>
       <sum> SUM:Int => SUM +Int I </sum>
endmodule

This simple definition takes a list of integers as input and sums them
together. Here we have declared two cells: <k> and <sum>. Unlike <k>,
<sum> does not get initialized via a configuration variable, but instead
is initialized statically with the value 0.

Note the rule in the second module: we have explicitly specified multiple
cells in a single rule. K will expect each of these cells to match in order for
the rule to apply.

Here is a second example (lesson-15-b.k):

module LESSON-15-B-SYNTAX
  imports INT-SYNTAX
endmodule

module LESSON-15-B
  imports LESSON-15-B-SYNTAX
  imports INT
  imports BOOL

  configuration <k> . </k>
                <first> $FIRST:Int </first>
                <second> $SECOND:Int </second>

  rule <k> . => FIRST >Int SECOND </k>
       <first> FIRST </first>
       <second> SECOND </second>
endmodule

This definition takes two integers as command-line arguments and populates the
<k> cell with a Boolean indicating whether the first integer is greater than
the second. Notice that we have specified no $PGM configuration variable
here. As a result, we cannot invoke krun via the syntax krun $file.
Instead, we must explicitly pass values for each configuration variable via the
-cFIRST and -cSECOND command line flags. For example, if we invoke
krun -cFIRST=0 -cSECOND=1, we will get the value false in the K cell.

You can also specify both a $PGM configuration variable and other
configuration variables in a single configuration declaration, in which case
you would be able to initialize $PGM with either a positional argument or the
-cPGM command line flag, but the other configuration variables would need
to be explicitly initialized with -c.

Exercise

Modify your solution to Lesson 1.14, Problem 2 to add a new cell with a
configuration variable of sort Bool. This variable should determine whether
the / operator is evaluated using /Int or divInt. Test that by specifying
different values for this variable, you can change the behavior of rounding on
division of negative numbers.

Cell Nesting

It is possible to nest cells inside one another. A cell that contains other
cells must contain only other cells, but in doing this, you are able to
create a hierarchical structure to the configuration. Consider the following
definition which is equivalent to the one in LESSON-15-B (lesson-15-c.k):

module LESSON-15-C-SYNTAX
  imports INT-SYNTAX
endmodule

module LESSON-15-C
  imports LESSON-15-C-SYNTAX
  imports INT
  imports BOOL

  configuration <T>
                  <k> . </k>
                  <state>
                    <first> $FIRST:Int </first>
                    <second> $SECOND:Int </second>
                  </state>
                </T>

  rule <k> . => FIRST >Int SECOND </k>
       <first> FIRST </first>
       <second> SECOND </second>
endmodule

Note that we have added some new cells to the configuration declaration:
the <T> cell wraps the entire configuration, and the <state> cell is
introduced around the <first> and <second> cells.

However, we have not changed the rule in this definition. This is because of
a concept in K called configuration abstraction. K allows you to specify
any number of cells in a rule (except zero) in any order you want, and K will
compile the rules into a form that matches the structure of the configuration
specified by the configuration declaration.

Here then, is how this rule would look after the configuration abstraction
has been resolved:

  rule <T>
         <k> . => FIRST >Int SECOND </k>
         <state>
           <first> FIRST </first>
           <second> SECOND </second>
         </state>
       </T>

In other words, K will complete cells to the top of the configuration by
inserting parent cells where appropriate based on the declared structure of
the configuration. This is useful because as a definition evolves, the
configuration may change, but you don't want to have to modify every single
rule each time. Thus, K follows the principle that you should only mention the
cells in a rule that are actually needed in order to accomplish its specific
goal. By following this best practice, you can significantly increase the
modularity of the definition and make it easier to maintain and modify.

Exercise

Modify your definition from the previous exercise in this lesson to wrap the
two cells you have declared in a top cell <T>. You should not have to change
any other rules in the definition.

Cell Variables

Sometimes it is desirable to explicitly match a variable against certain
fragments of the configuration. Because K's configuration is hierarchical,
we can grab subsets of the configuration as if they were just another term.
However, configuration abstraction applies here as well.
In particular, for each cell you specify in a configuration declaration, a
unique sort is assigned for that cell with a single constructor (the cell
itself). The sort name is taken by removing all special characters,
capitalizing the first letter and each letter after a hyphen, and adding the
word Cell at the end. For example, in the above example, the cell sorts are
TCell, KCell, StateCell, FirstCell, and SecondCell. If we had declared
a cell as <first-number>, then the cell sort name would be FirstNumberCell.

You can explicitly reference a variable of one of these sorts anywhere you
might instead write that cell. For example, consider the following rule:

  rule <k> true => S </k>
       (S:StateCell => <state>... .Bag ...</state>)

Here we have introduced two new concepts. The first is the variable of sort
StateCell, which matches the entire <state> part of the configuration. The
second is that we have introduced the concept of ... once again. When a cell
contains other cells, it is also possible to specify ... on either the left,
right or both sides of the cell term. Each of these three syntaxes are
equivalent in this case. When they appear on the left-hand side of a rule, they
indicate that we don't care what value any cells not explicitly named might
have. For example, we might write <state>... <first> 0 </first> ...</state> on
the left-hand side of a rule in order to indicate that we want to match the
rule when the <first> cell contains a zero, regardless of what the <second>
cell contains. If we had not included this ellipsis, it would have been a
syntax error, because K would have expected you to provide a value for each of
the child cells.

However, if, as in the example above, the ... appeared on the right-hand side
of a rule, this instead indicates that the cells not explicitly mentioned under
the cell should be initialized with their default value from the configuration
declaration. In other words, that rule will set the value of <first> and
<second> to zero.

You may note the presence of the phrase .Bag here. You can think of this as
the empty set of cells. It is used as the child of a cell when you want to
indicate that no cells should be explicitly named. We will cover other uses
of this term in later lessons.

Exercises

  1. Modify the definition from the previous exercise in this lesson so that the
    Boolean cell you created is initialized to false. Then add a production
    syntax Stmt ::= Bool ";" Exp, and a rule that uses this Stmt to set the
    value of the Boolean flag. Then add another production
    syntax Stmt ::= "reset" ";" Exp which sets the value of the Boolean flag back
    to its default value via a ... on the right-hand side. You will need to add
    an additional cell around the Boolean cell to make this work.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.16: Maps, Semantic Lists, and Sets.

Lesson 1.16: Maps, Semantic Lists, and Sets

The purpose of this lesson is to explain how to use the data structure sorts
provided by K: maps, lists, and sets.

Maps

The most frequently used type of data structure in K is the map. The sort
provided by K for this purpose is the Map sort, and it is provided in
domains.md in the MAP
module. This type is not (currently) polymorphic. All Map terms are maps that
map terms of sort KItem to other terms of sort KItem. A KItem can contain
any sort except a K sequence. If you need to store such a term in a
map, you can always use a wrapper such as syntax KItem ::= kseq(K).

A Map pattern consists of zero or more map elements (as represented by the
symbol syntax Map ::= KItem "|->" KItem), mixed in any order, separated by
whitespace, with zero or one variables of sort Map. The empty map is
represented by .Map. If all of the bindings for the variables in the keys
of the map can be deterministically chosen, these patterns can be matched in
O(1) time. If they cannot, then each map element that cannot be
deterministically constructed contributes a single dimension of polynomial
time to the cost of the matching. In other words, a single such element is
linear, two are quadratic, three are cubic, etc.

Patterns like the above are the only type of Map pattern that can appear
on the left-hand-side of a rule. In other words, you are not allowed to write
a Map pattern on the left-hand-side with more than one variable of sort Map
in it. You are, however, allowed to write such patterns on the right-hand-side
of a rule. You can also write a function pattern in the key of a map element
so long as all the variables in the function pattern can be deterministically
chosen.

Note the meaning of matching on a Map pattern: a map pattern with no
variables of sort Map will match if the map being matched has exactly as
many bindings as |-> symbols in the pattern. It will then match if each
binding in the map pattern matches exactly one distinct binding in the map
being matched. A map pattern with one Map variable will also match any map
that contains such a map as a subset. The variable of sort Map will be bound
to whatever bindings are left over (.Map if there are no bindings left over).

Here is an example of a simple definition that implements a very basic
variable declaration semantics using a Map to store the value of variables
(lesson-16-a.k):

module LESSON-16-A-SYNTAX
  imports INT-SYNTAX
  imports ID-SYNTAX

  syntax Exp ::= Id | Int
  syntax Decl ::= "int" Id "=" Exp ";" [strict(2)]
  syntax Pgm ::= List{Decl,""}
endmodule

module LESSON-16-A
  imports LESSON-16-A-SYNTAX
  imports BOOL

  configuration <T>
                  <k> $PGM:Pgm </k>
                  <state> .Map </state>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // variable declaration
  rule <k> int X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>

  // variable lookup
  rule <k> X:Id => I ...</k>
       <state>... X |-> I ...</state>

  syntax Bool ::= isKResult(K) [symbol, function]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

There are several new features in this definition. First, note we import
the module ID-SYNTAX. This module is defined in domains.md and provides a
basic syntax for identifiers. We are using the Id sort provided by this
module in this definition to implement the names of program variables. This
syntax is only imported when parsing programs, not when parsing rules. Later in
this lesson we will see how to reference specific concrete identifiers in a
rule.

Second, we introduce a single new function over the Map sort. This function,
which is represented by the symbol
syntax Map ::= Map "[" KItem "<-" KItem "]", represents the map update
operation. Other functions over the Map sort can be found in domains.md.

Finally, we have used the ... syntax on a cell containing a Map. In this
case, the meaning of <state>... Pattern ...</state>,
<state>... Pattern </state>, and <state> Pattern ...</state> are the same:
it is equivalent to writing <state> (Pattern) _:Map </state>.

Consider the following program (a.decl):

int x = 0;
int y = 1;
int a = x;

If we run this program with krun, we will get the following result:

<T>
  <k>
    .
  </k>
  <state>
    a |-> 0
    x |-> 0
    y |-> 1
  </state>
</T>

Note that krun has automatically sorted the collection for you. This doesn't
happen at runtime, so you still get the performance of a hash map, but it will
help make the output more readable.

Exercise

Create a sort Stmt that is a subsort of Decl. Create a production of sort
Stmt for variable assignment in addition to the variable declaration
production. Feel free to use the syntax syntax Stmt ::= Id "=" Exp ";". Write
a rule that implements variable assignment using a map update function. Then
write the same rule using a map pattern. Test your implementations with some
programs to ensure they behave as expected.

Semantic Lists

In a previous lesson, we explained how to represent lists in the AST of a
program. However, this is not the only context where lists can be used. We also
frequently use lists in the configuration of an interpreter in order to
represent certain types of program state. For this purpose, it is generally
useful to have an associative-list sort, rather than the cons-list sorts
provided in Lesson 1.12.

The type provided by K for this purpose is the List sort, and it is also
provided in domains.md, in the LIST module. This type is also not
(currently) polymorphic. Like Map, all List terms are lists of terms of the
KItem sort.

A List pattern in K consists of zero or more list elements (as represented by
the ListItem symbol), followed by zero or one variables of sort List,
followed by zero or more list elements. An empty list is represented by
.List. These patterns can be matched in O(log(N)) time. This is the only
type of List pattern that can appear on the left-hand-side of a rule. In
other words, you are not allowed to write a List pattern on the
left-hand-side with more than one variable of sort List in it. You are,
however, allowed to write such patterns on the right-hand-side of a rule.

Note the meaning of matching on a List pattern: a list pattern with no
variables of sort List will match if the list being matched has exactly as
many elements as ListItem symbols in the pattern. It will then match if each
element in sequence matches the pattern contained in the ListItem symbol. A
list pattern with one variable of sort List operates the same way, except
that it can match any list with at least as many elements as ListItem
symbols, so long as the prefix and suffix of the list match the patterns inside
the ListItem symbols. The variable of sort List will be bound to whatever
elements are left over (.List if there are no elements left over).

The ... syntax is allowed on cells containing lists as well. In this case,
the meaning of <cell>... Pattern </cell> is the same as
<cell> _:List (Pattern) </cell>, the meaning of <cell> Pattern ...</cell>
is the same as <cell> (Pattern) _:List</cell>. Because list patterns with
multiple variables of sort List are not allowed, it is an error to write
<cell>... Pattern ...</cell>.

Here is an example of a simple definition that implements a very basic
function-call semantics using a List as a function stack (lesson-16-b.k):

module LESSON-16-B-SYNTAX
  imports INT-SYNTAX
  imports ID-SYNTAX

  syntax Exp ::= Id "(" ")" | Int
  syntax Stmt ::= "return" Exp ";" [strict]
  syntax Decl ::= "fun" Id "(" ")" "{" Stmt "}"
  syntax Pgm ::= List{Decl,""}
  syntax Id ::= "main" [token]
endmodule

module LESSON-16-B
  imports LESSON-16-B-SYNTAX
  imports BOOL
  imports LIST

  configuration <T>
                  <k> $PGM:Pgm ~> main () </k>
                  <functions> .Map </functions>
                  <fstack> .List </fstack>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // function definitions
  rule <k> fun X:Id () { S } => . ...</k>
       <functions>... .Map => X |-> S ...</functions>

  // function call
  syntax KItem ::= stackFrame(K)
  rule <k> X:Id () ~> K => S </k>
       <functions>... X |-> S ...</functions>
       <fstack> .List => ListItem(stackFrame(K)) ...</fstack>

  // return statement
  rule <k> return I:Int ; ~> _ => I ~> K </k>
       <fstack> ListItem(stackFrame(K)) => .List ...</fstack>

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

Notice that we have declared the production syntax Id ::= "main" [token].
Since we use the ID-SYNTAX module, this declaration is necessary in order to
be able to refer to the main identifier directly in the configuration
declaration. Our <k> cell now contains a K sequence initially: first we
process all the declarations in the program, then we call the main function.

Consider the following program (foo.func):

fun foo() { return 5; }
fun main() { return foo(); }

When we krun this program, we should get the following output:

<T>
  <k>
    5 ~> .
  </k>
  <functions>
    foo |-> return 5 ;
    main |-> return foo ( ) ;
  </functions>
  <fstack>
    .List
  </fstack>
</T> 

Note that we have successfully put on the <k> cell the value returned by the
main function.

Exercise

Add a term of sort Id to the stackFrame operator to keep track of the
name of the function in that stack frame. Then write a function
syntax String ::= printStackTrace(List) that takes the contents of the
<fstack> cell and pretty prints the current stack trace. You can concatenate
strings with +String in the STRING module in domains.md, and you can
convert an Id to a String with the Id2String function in the ID module.
Test this function by creating a new expression that returns the current stack
trace as a string. Make sure to update isKResult and the Exp sort as
appropriate to allow strings as values.

Sets

The final primary data structure sort in K is a set, i.e., an idempotent
unordered collection where elements are deduplicated. The sort provided by K
for this purpose is the Set sort and it is provided in domains.md in the
SET module. Like maps and lists, this type is not (currently) polymorphic.
Like Map and List, all Set terms are sets of terms of the KItem sort.

A Set pattern has the exact same restrictions as a Map pattern, except that
its elements are treated like keys, and there are no values. It has the same
performance characteristics as well. However, syntactically it is more similar
to the List sort: An empty Set is represented by .Set, but a set element
is represented by the SetItem symbol.

Matching behaves similarly to the Map sort: a set pattern with no variables
of sort Set will match if the set has exactly as many bindings as SetItem
symbols, and if each element pattern matches one distinct element in the set.
A set with a variable of sort Set also matches any superset of such a set.
As with map, the elements left over will be bound to the Set variable (or
.Set if no elements are left over).

Like Map, the ... syntax on a set is syntactic sugar for an anonymous
variable of sort Set.

Here is an example of a simple modification to LESSON-16-A which uses a Set
to ensure that variables are never declared more than once. In practice, you
would likely just use the in_keys symbol over maps to test for this, but
it's still useful as an example of sets in practice:

module LESSON-16-C-SYNTAX
  imports LESSON-16-A-SYNTAX
endmodule

module LESSON-16-C
  imports LESSON-16-C-SYNTAX
  imports BOOL
  imports SET

  configuration <T>
                  <k> $PGM:Pgm </k>
                  <state> .Map </state>
                  <declared> .Set </declared>
                </T>

  // declaration sequence
  rule <k> D:Decl P:Pgm => D ~> P ...</k>
  rule <k> .Pgm => . ...</k>

  // variable declaration
  rule <k> int X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>
       <declared> D => D SetItem(X) </declared>
    requires notBool X in D

  // variable lookup
  rule <k> X:Id => I ...</k>
       <state>... X |-> I ...</state>
       <declared>... SetItem(X) ...</declared>

  syntax Bool ::= isKResult(K) [symbol, function]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

Now if we krun a program containing duplicate declarations, it will get
stuck on the declaration.

Exercises

  1. Modify your solution to Lesson 1.14, Problem 2 and introduce the sorts
    Decls, Decl, and Stmt which include variable and function declaration
    (without function parameters), and return and assignment statements, as well
    as call expressions. Use List and Map to implement these operators, making
    sure to consider the interactions between components, such as saving and
    restoring the environment of variables at each call site. Don't worry about
    local function definitions or global variables for now. Make sure to test the
    resulting interpreter.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.17: Cell Multiplicity and Cell Collections.

Lesson 1.17: Cell Multiplicity and Cell Collections

The purpose of this lesson is to explain how you can create optional cells
and cells that repeat multiple times in a configuration using a feature called
cell multiplicity.

Cell Multiplicity

K allows you to specify attributes for cell productions as part of the syntax
of configuration declarations. Unlike regular productions, which use the []
syntax for attributes, configuration cells use an XML-like attribute syntax:

configuration <k color="red"> $PGM:K </k>

This configuration declaration gives the <k> cell the color red during
unparsing using the color attribute as discussed in
Lesson 1.9.

However, in addition to the usual attributes for productions, there are some
other attributes that can be applied to cells with special meaning. One such
attribute is the multiplicity attribute. By default, each cell that is
declared occurs exactly once in every configuration term. However, using the
multiplicity attribute, this default behavior can be changed. There are two
values that this attribute can have: ? and *.

Optional cells

The first cell multiplicity we will discuss is ?. Similar to a regular
expression language, this attribute tells the compiler that this cell can
appear 0 or 1 times in the configuration. In other words, it is an
optional cell. By default, K does not create optional cells in the initial
configuration, unless that optional cell has a configuration variable inside
it. However, it is possible to override the default behavior and create that
cell initially by adding the additional cell attribute initial="".

K uses the .Bag symbol to represent the absence of any cells in a particular
rule. Consider the following module:

module LESSON-17-A
  imports INT

  configuration <k> $PGM:K </k>
                <optional multiplicity="?"> 0 </optional>

  syntax KItem ::= "init" | "destroy"

  rule <k> init => . ...</k>
       (.Bag => <optional> 0 </optional>)
  rule <k> destroy => . ...</k>
       (<optional> _ </optional> => .Bag)

endmodule

In this definition, when the init symbol is executed, the <optional> cell
is added to the configuration, and when the destroy symbol is executed, it
is removed. Any rule that matches on that cell will only match if that cell is
present in the configuration.

Exercise

Create a simple definition with a Stmts sort that is a List{Stmt,""} and
a Stmt sort with the constructors
syntax Stmt ::= "enable" | "increment" | "decrement" | "disable". The
configuration should have an optional cell that contains an integer that
is created with the enable command, destroyed with the disable command,
and its value is incremented or decremented by the increment and decrement
command.

Cell collections

The second type of cell multiplicity we will discuss is *. Simlar to a
regular expression language, this attribute tells the compiler that this cell
can appear 0 or more times in the configuration. In other words, it is a
cell collection. Cells with multiplicity * must be the only child of
their parent cell. As a convention, the inner cell is usually named with the
singular form of what it contains, and the outer cell with the plural form, for
example, "thread" and "threads".

All cell collections are required to have the type attribute set to either
Set or Map. A Set cell collection is represented as a set and behaves
internally the same as the Set sort, although it actually declares a new
sort. A Map cell collection is represented as a Map in which the first
subcell of the cell collection is the key and the remaining cells are the
value.

For example, consider the following module:

module LESSON-17-B
  imports INT
  imports BOOL
  imports ID-SYNTAX

  syntax Stmt ::= Id "=" Exp ";" [strict(2)]
                | "return" Exp ";" [strict]
  syntax Stmts ::= List{Stmt,""}
  syntax Exp ::= Id 
               | Int 
               | Exp "+" Exp [seqstrict]
               | "spawn" "{" Stmts "}"
               | "join" Exp ";" [strict]

  configuration <threads>
                  <thread multiplicity="*" type="Map">
                    <id> 0 </id>
                    <k> $PGM:K </k>
                  </thread>
                </threads>
                <state> .Map </state>
                <next-id> 1 </next-id>

  rule <k> X:Id => I:Int ...</k>
       <state>... X |-> I ...</state>
  rule <k> X:Id = I:Int ; => . ...</k>
       <state> STATE => STATE [ X <- I ] </state>
  rule <k> S:Stmt Ss:Stmts => S ~> Ss ...</k>
  rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>

  rule <thread>...
         <k> spawn { Ss } => NEXTID ...</k>
       ...</thread>
       <next-id> NEXTID => NEXTID +Int 1 </next-id>
       (.Bag => 
       <thread>
         <id> NEXTID </id>
         <k> Ss </k>
       </thread>)

  rule <thread>...
         <k> join ID:Int ; => I ...</k>
       ...</thread>
       (<thread>
         <id> ID </id>
         <k> return I:Int ; ...</k>
       </thread> => .Bag)

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_) => false [owise]
endmodule

This module implements a very basic fork/join semantics. The spawn expression
spawns a new thread to execute a sequence of statements and returns a thread
id, and the join statement waits until a thread executes return and then
returns the return value of the thread.

Note something quite novel here: the <k> cell is inside a cell of
multiplicity *. Since the <k> cell is just a regular cell (mostly), this
is perfectly allowable. Rules that don't mention a specific thread are
automatically completed to match any thread.

When you execute programs in this language, the cells in the cell collection
get sorted and printed like any other collection, but they still display like
cells. Rules in this language also benefit from all the structural power of
cells, allowing you to omit cells you don't care about or complete the
configuration automatically. This allows you to have the power of cells while
still being a collection under the hood.

Exercises

  1. Modify the solution from Lesson 1.16, Problem 1 so that the cell you use to
    keep track of functions in a Map is now a cell collection. Run some programs
    and compare how they get unparsed before and after this change.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.18: Term Equality and the Ternary Operator.

Lesson 1.18: Term Equality and the Ternary Operator

The purpose of this lesson is to introduce how to compare equality of terms in
K, and how to put conditional expressions directly into the right-hand side of
rules.

Term Equality

One major way you can compare whether two terms are equal in K is to simply
match both terms with a variable with the same name. This will only succeed
in matching if the two terms are equal structurally. However, sometimes this
is impractical, and it is useful to have access to a way to actually compare
whether two terms in K are equal. The operator for this is found in
domains.md in the K-EQUAL
module. The operator is ==K and takes two terms of sort K and returns a
Bool. It returns true if they are equal. This includes equality over builtin
types such as Map and Set where equality is not purely structural in
nature. However, it does not include any notion of semantic equality over
user-defined syntax. The inverse symbol for inequality is =/=K.

Ternary Operator

One way to introduce conditional logic in K is to have two separate rules,
each with a side condition (or one rule with a side condition and another with
the owise attribute). However, sometimes it is useful to explicitly write
a conditional expression directly in the right-hand side of a rule. For this
purpose, K defines one more operator in the K-EQUAL module, which corresponds
to the usual ternary operator found in many languages. Here is an example of its
usage (lesson-18.k):

module LESSON-18
  imports INT
  imports BOOL
  imports K-EQUAL

  syntax Exp ::= Int | Bool | "if" "(" Exp ")" Exp "else" Exp [strict(1)]

  syntax Bool ::= isKResult(K) [function, symbol]
  rule isKResult(_:Int) => true
  rule isKResult(_:Bool) => true

  rule if (B:Bool) E1:Exp else E2:Exp => #if B #then E1 #else E2 #fi
endmodule

Note the symbol on the right-hand side of the final rule. This symbol is
polymorphic: B must be of sort Bool, but E1 and E2 could have been
any sort so long as both were of the same sort, and the sort of the entire
expression becomes equal to that sort. K supports polymorphic built-in
operators, but does not yet allow users to write their own polymorphic
productions.

The behavior of this function is to evaluate the Boolean expression to a
Boolean, then pick one of the two children and return it based on whether the
Boolean is true or false. Please note that it is not a good idea to use this
symbol in cases where one or both of the children is potentially undefined
(for example, an integer expression that divides by zero). While the default
implementation is smart enough to only evaluate the branch that happens to be
picked, this will not be true when we begin to do program verification. If
you need short circuiting behavior, it is better to use a side condition.

Exercises

  1. Write a function in K that takes two terms of sort K and returns an
    Int: the Int should be 0 if the terms are equal and 1 if the terms are
    unequal.

  2. Modify your solution to Lesson 1.16, Problem 1 and introduce an if
    Stmt to the syntax of the language, then implement it using the #if symbol.
    Make sure to write tests for the resulting interpreter.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.19: Debugging with GDB.

Lesson 1.19: Debugging with GDB

The purpose of this lesson is to teach how to debug your K interpreter using
the K-language support provided in GDB.

Caveats

Debugging K definitions using GDB is currently only supported on Linux; the
instructions in this section will not work properly on macOS. Support for
debugging K using LLDB is a work in progress, and this chapter will be updated
when doing so is possible.

Getting started

You will need GDB in order to complete this lesson. If you do not already
have GDB installed, then do so. Steps to install GDB are outlined in
this GDB Tutorial.

The first thing neccessary in order to debug a K interpreter in GDB is to
build the interpreter with full debugging support enabled. This can be done
relatively simply. First, make sure you have not passed -O1, -O2, or -O3
to kompile. Second, simply add the command line flags -ccopt -g -ccopt -O1
to kompile. The resulting compiled K definition will be ready to support
debugging.

Note: the 'O' in -O1 is the letter 'O' not the number 0!

Once you have a compiled K definition and a program you wish to debug, you
can start the debugger by passing the --debugger flag to krun. This will
automatically load the program you are executing into GDB and drop you into
a GDB shell ready to start executing the program.

As an example, consider the following K definition (lesson-19-a.k):

module LESSON-19-A
  imports INT

  rule I => I +Int 1
    requires I <Int 100
endmodule

If we compile this definition with
kompile lesson-19-a.k -ccopt -g -ccopt -O1, and run the program 0 in the
debugger with krun -cPGM=0 --debugger, we will see the following output
(roughly):

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./lesson-19-a-kompiled/interpreter...
warning: File "/home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter
line to your configuration file "/home/dwightguth/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/dwightguth/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb)

To make full advantage of the GDB features of K, you should follow the first
command listed in this output message and add the corresponding
add-auto-load-safe-path command to your ~/.gdbinit file as prompted.
Please note that the path will be different on your machine than the one
listed above. Adding directories to the "load safe path" effectively tells GDB
to trust those directories. All content under a given directory will be recursively
trusted, so if you want to avoid having to add paths to the "load safe path" every
time you kompile a different K definition, then you can just trust a minimal
directory containing all your kompiled files; however, do not choose a top-level directory containing arbitrary files as this amounts to trusting arbitrary files and is a security risk. More info on the load safe path
can be found here.

Basic commands

The most basic commands you can execute in the K GDB session are to run your
program or to step through it. The first can be accomplished using GDB's
built-in run command. This will automatically start the program and begin
executing it. It will continue until the program aborts or finishes, or the
debugger is interrupted with Ctrl-C.

Sometimes you want finer-grained control over how you proceed through the
program you are debugging. To step through the rule applications in your
program, you can use the k start and k step GDB commands.

k start is similar to the built-in start command in that it starts the
program and then immediately breaks before doing any work. However, unlike
the start command which will break immediately after the main method of
a program is executed, the K start program will initialize the rewriter,
evaluate the initial configuration, and break immediately prior to applying
any rewrite steps.

In the example above, here is what we see when we run the k start command:

Temporary breakpoint 1 at 0x239210
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter .krun-2021-08-13-14-10-50-sMwBkbRicw/tmp.in.01aQt85TaA -1 .krun-2021-08-13-14-10-50-sMwBkbRicw/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, 0x0000000000239210 in main ()
0x0000000000231890 in step (subject=<k>
  0 ~> .
</k>)
(gdb)

As you can see, we are stopped at the step function in the interpreter.
This function is responsible for taking top-level rewrite steps. The subject
parameter to this function is the current K configuration.

We can step through K rewrite steps one at a time by running the k step
command. By default, this takes a single rewrite step (including any function
rule applications that are part of that step).

Here is what we see when we run that command:

Continuing.

Temporary breakpoint -22, 0x0000000000231890 in step (subject=<k>
  1 ~> .
</k>)
(gdb)

As we can see, we have taken a single rewrite step. We can also pass a number
to the k step command which indicates the number of rewrite steps to take.

Here is what we see if we run k step 10:

Continuing.

Temporary breakpoint -23, 0x0000000000231890 in step (subject=<k>
  11 ~> .
</k>)
(gdb)

As we can see, ten rewrite steps were taken.

Breakpoints

The next important step in debugging an application in GDB is to be able to
set breakpoints. Generally speaking, there are three types of breakpoints we
are interested in in a K semantics: Setting a breakpoint when a particular
function is called, setting a breakpoint when a particular rule is applied,
and setting a breakpoint when a side condition of a rule is evaluated.

The easiest way to do the first two things is to set a breakpoint on the
line of code containing the function or rule.

For example, consider the following K definition (lesson-19-b.k):

module LESSON-19-B
  imports BOOL

  syntax Bool ::= isBlue(Fruit) [function]
  syntax Fruit ::= Blueberry() | Banana()
  rule isBlue(Blueberry()) => true
  rule isBlue(Banana()) => false

  rule F:Fruit => isBlue(F)
endmodule

Once this program has been compiled for debugging, we can run the program
Blueberry(). We can then set a breakpoint that stops when the isBlue
function is called with the following command:

break lesson-19-b.k:4

Here is what we see if we set this breakpoint and then run the interpreter:

(gdb) break lesson-19-b.k:4
Breakpoint 1 at 0x231040: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 4.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-20-27-vXOQmV6lwS/tmp.in.fga98yqXlc -1 .krun-2021-08-13-14-20-27-vXOQmV6lwS/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit (_1=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:4
4         syntax Bool ::= isBlue(Fruit) [function]
(gdb)

As we can see, we have stopped at the point where we are evaluating that
function. The value _1 that is a parameter to that function shows the
value passed to the function by the caller.

We can also break when the isBlue(Blueberry()) => true rule applies by simply
changing the line number to the line number of that rule:

(gdb) break lesson-19-b.k:6
Breakpoint 1 at 0x2af710: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-32-36-7kD0ic7XwD/tmp.in.8JNH5Qtmow -1 .krun-2021-08-13-14-32-36-7kD0ic7XwD/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, apply_rule_138 () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:6
6         rule isBlue(Blueberry()) => true
(gdb)

We can also do the same with a top-level rule:

(gdb) break lesson-19-b.k:9
Breakpoint 1 at 0x2aefa0: lesson-19-b.k:9. (2 locations)
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-33-13-9fC8Sz4aO3/tmp.in.jih1vtxSiQ -1 .krun-2021-08-13-14-33-13-9fC8Sz4aO3/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, apply_rule_107 (Var'Unds'DotVar0=<generatedCounter>
  0
</generatedCounter>, Var'Unds'DotVar1=., VarF=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:9
9         rule F:Fruit => isBlue(F)
(gdb)

Unlike the function rule above, we see several parameters to this function.
These are the substitution that was matched for the function. Variables only
appear in this substitution if they are actually used on the right-hand side
of the rule.

Advanced breakpoints

Sometimes it is inconvenient to set the breakpoint based on a line number.

It is also possible to set a breakpoint based on the rule label of a particular
rule. Consider the following definition (lesson-19-c.k):

module LESSON-19-C
  imports INT
  imports BOOL

  syntax Bool ::= isEven(Int) [function]
  rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
  rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0

endmodule

We will run the program isEven(4). We can set a breakpoint for when a rule
applies by means of the MODULE-NAME.label.rhs syntax:

(gdb) break LESSON-19-C.isEven.rhs
Breakpoint 1 at 0x2afda0: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-40-29-LNNT8YEZ61/tmp.in.ZG93vWCGGC -1 .krun-2021-08-13-14-40-29-LNNT8YEZ61/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LESSON-19-C.isEven.rhs () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6         rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb)

We can also set a breakpoint for when a rule's side condition is evaluated
by means of the MODULE-NAME.label.sc syntax:

(gdb) break LESSON-19-C.isEven.sc
Breakpoint 1 at 0x2afd70: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-41-48-1BoGfJRbYc/tmp.in.kg4F8cwfCe -1 .krun-2021-08-13-14-41-48-1BoGfJRbYc/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6         rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb) finish
Run till exit from #0  LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
0x00000000002b2662 in LblisEven'LParUndsRParUnds'LESSON-19-C'Unds'Bool'Unds'Int (_1=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:5
5         syntax Bool ::= isEven(Int) [function]
Value returned is $1 = true
(gdb)

Here we have used the built-in GDB command finish to tell us whether the
side condition returned true or not. Note that once again, we see the
substitution that was matched from the left-hand side. Like before, a variable
will only appear here if it is used in the side condition.

Debugging rule matching

Sometimes it is useful to try to determine why a particular rule did or did
not apply. K provides some basic debugging commands which make it easier
to determine this.

Consider the following K definition (lesson-19-d.k):

module LESSON-19-D

  syntax Foo ::= foo(Bar)
  syntax Bar ::= bar(Baz) | bar2(Baz)
  syntax Baz ::= baz() | baz2()

  rule [baz]: foo(bar(baz())) => .K

endmodule

Suppose we try to run the program foo(bar(baz2())). It is obvious from this
example why the rule in this definition will not apply. However, in practice,
such cases are not always obvious. You might look at a rule and not immediately
spot why it didn't apply on a particular term. For this reason, it can be
useful to get the debugger to provide a log about how it tried to match that
term. You can do this with the k match command. If you are stopped after
having run k start or k step, you can obtain this log for any rule after
any step by running the command k match MODULE.label subject for a particular
top-level rule label.

For example, with the baz rule above, we get the following output:

(gdb) k match LESSON-19-D.baz subject
Subject:
baz2 ( )
does not match pattern:
baz ( )

As we can see, it provided the exact subterm which did not match against the
rule, as well as the particular subpattern it ought to have matched against.

This command does not actually take any rewrite steps. In the event that
matching actually succeeds, you will still need to run the k step command
to advance to the next step.

Final notes

In addition to the functionality provided above, you have the full power of
GDB at your disposal when debugging. Some features are not particularly
well-adapted to K code and may require more advanced knowledge of the
term representation or implementation to use effectively, but anything that
can be done in GDB can in theory be done using this debugging functionality.
We suggest you refer to the
GDB Documentation if you
want to try to do something and are unsure as to how.

Exercises

  1. Compile your solution to Lesson 1.18, Problem 2 with debugging support
    enabled and step through several programs you have previously used to test.
    Then set a breakpoint on the isKResult function and observe the state of the
    interpreter when stopped at that breakpoint. Set a breakpoint on the rule for
    addition and run a program that causes it to be stopped at that breakpoint.
    Finally, step through the program until the addition symbol is at the top
    of the K cell, and then use the k match command to report the reason why
    the subtraction rule does not apply. You may need to modify the definition
    to insert some rule labels.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.20: K Backends and the Haskell Backend.

Lesson 1.20: K Backends and the Haskell Backend

The purpose of this lesson is to teach about the multiple backends of K,
in particular the Haskell Backend which is the complement of the backend we
have been using so far.

K Backends

Thus far, we have not discussed the distinction between the K frontend and
the K backends at all. We have simply assumed that if you run kompile on a
K definition, there will be a compiler backend that will allow you to execute
the K definition you have compiled.

K actually has multiple different backends. The one we have been using so far
implicitly, the default backend, is called the LLVM Backend. It is
designed to support efficient, optimized concrete execution and search. It
does this by compiling your K definition to LLVM bitcode and then using LLVM
to generate machine code for it that is compiled and linked and executed.
However, K is a formal methods toolkit at the end of the day, and the primary
goal many people have when defining a programming language in K is to
ultimately be able to perform more advanced verification on programs in their
programming language.

It is for this purpose that K also provides the Haskell Backend, so called
because it is implemented in Haskell. While we will cover the features of the
Haskell Backend in more detail in the next two lessons, the important thing to
understand is that it is a separate backend which is optimized for more formal
reasoning about programming languages. While it is capable of performing
concrete execution, it does not do so as efficiently as the LLVM Backend.
In exchange, it provides more advanced features.

Choosing a backend

You can choose which backend to use to compile a K definition by means of the
--backend flag to kompile. By default, if you do not specify this flag, it
is equivalent to if you had specified --backend llvm. However, to use the
Haskell Backend instead, you can simply say kompile --backend haskell on a
particular K definition.

As an example, here is a simple K definition that we have seen before in the
previous lesson (lesson-20.k):

module LESSON-20
  imports INT

  rule I => I +Int 1
    requires I <Int 100
endmodule

Previously we compiled this definition using the LLVM Backend, but if we
instead execute the command kompile lesson-20.k --backend haskell, we
will get an interpreter for this K definition that is implemented in Haskell
instead. Unlike the default LLVM Backend, the Haskell Backend is not a
compiler per se. It does not generate new Haskell code corresponding to your
programming language and then compile and execute it. Instead, it is an
interpreter which reads the generated IR from kompile and implements in
Haskell an interpreter that is capable of interpreting any K definition.

Note that on arm64 macOS (Apple Silicon), there is a known issue with the Compact
library that causes crashes in the Haskell backend. Pass the additional flag
--no-haskell-binary to kompile to resolve this.

Exercise

Try running the program 0 in this K definition on the Haskell Backend and
compare the final configuration to what you would get compiling the same
definition with the LLVM Backend.

Legacy backends

As a quick note, K does provide one other backend, which exists primarily as
legacy code which should be considered deprecated. This is the
Java Backend. The Java Backend is essentially a precursor to the Haskell
Backend. We will not cover this backend in any detail since it is deprecated,
but we still mention it here for the purposes of understanding.

Exercises

  1. Compile your solution to Lesson 1.18, Problem 2 with the Haskell Backend
    and execute some programs. Compare the resulting configurations with the
    output of the same program on the LLVM Backend. Note that if you are getting
    different behaviors on the Haskell backend, you might have some luck debugging
    by passing --search to krun when using the LLVM backend.

Next lesson

Once you have completed the above exercises, you can continue to
Lesson 1.21: Unification and Symbolic Execution.

Lesson 1.21: Unification and Symbolic Execution

The purpose of this lesson is to teach the basic concepts of symbolic execution
in order to introduce the unique capabilities of the Haskell Backend at a
conceptual level.

Symbolic Execution

Thus far, all of the programs we have run using K have been concrete
configurations. What this means is that the configuration we use to initialize
the K rewrite engine is concrete; in other words, contains no logical
variables. The LLVM Backend is a concrete execution engine, meaning that
it is only capable of rewriting concrete configurations.

By contrast, the Haskell Backend performs symbolic execution, which is
capable of rewriting any configuration, including those where parts of the
configuration are symbolic, ie, contain variables or uninterpreted
functions.

Unification

Previously, we have introduced the concept that K rewrite rules operate by
means of pattern matching: the current configuration being rewritten is pattern
matched against the left-hand side of the rewrite rule, and the substitution
is used in order to construct a new term from the right-hand side. In symbolic
execution, we use
unification
instead of pattern matching. To summarize, unification behaves akin to a
two-way pattern matching where both the configuration and the left-hand side
of the rule can contain variables, and the algorithm generates a
most general unifier containing substitutions for the variables in both
which will make both terms equal.

Feasibility

Unification by itself cannot completely solve the problem of symbolic
execution. One task symbolic execution must perform is to identify whether
a particular symbolic term is feasible, that is to say, that there actually
exists a concrete instantiation of that term such that all the logical
constraints on that term can actually be satisfied. The Haskell Backend
delegates this task to Z3, a
SMT solver.
This solver is used to periodically trim configurations that are determined
to be mathematically infeasible.

Symbolic terms

The final component of symbolic execution consists of the task of introducing
symbolic terms into the configuration. This can be done one of two different
ways. First, the term being passed to krun can actually be symbolic. This
is less frequently used because it requires the user to construct an AST
that contains variables, something which our current parsing capabilities are
not well-equipped to do. The second, more common, way of introducing symbolic
terms into a configuration consists of writing rules where there exists an
existentially qualified variable on the right-hand side of the rule that does
not exist on the left-hand side of the rule.

In order to prevent users from writing such rules by accident, K requires
that such variables begin with the ? prefix. For example, here is a rule
that rewrites a constructor foo to a symbolic integer:

rule <k> foo => ?X:Int ...</k>

When this rule applies, a fresh variable is introduced to the configuration, which
then is unified against the rules that might apply in order to symbolically
execute that configuration.

ensures clauses

We also introduce here a new feature of K rules that applies when a rule
has this type of variable on the right-hand side: the ensures clause.
An ensures clause is similar to a requires clause and can appear after
a rule body, or after a requires clause. The ensures clause is used to
introduce constraints that might apply to the variable that was introduced by
that rule. For example, we could write the rule above with the additional
constraint that the symbolic integer that was introduced must be less than
five, by means of the following rule:

rule <k> foo => ?X:Int ...</k> ensures ?X <Int 5

Putting it all together

Putting all these pieces together, it is possible to use the Haskell Backend
to perform symbolic reasoning about a particular K module, determining all the
possible states that can be reached by a symbolic configuration.

For example, consider the following K definition (lesson-21.k):

module LESSON-21
    imports INT

    rule <k> 0 => ?X:Int ... </k> ensures ?X =/=Int 0
    rule <k> X:Int => 5  ... </k> requires X >=Int 10
endmodule

When we symbolically execute the program 0, we get the following output
from the Haskell Backend:

    <k>
      5 ~> .
    </k>
  #And
    {
      true
    #Equals
      ?X:Int >=Int 10
    }
  #And
    #Not ( {
      ?X:Int
    #Equals
      0
    } )
#Or
    <k>
      ?X:Int ~> .
    </k>
  #And
    #Not ( {
      true
    #Equals
      ?X:Int >=Int 10
    } )
  #And
    #Not ( {
      ?X:Int
    #Equals
      0
    } )

Note some new symbols introduced by this configuration: #And, #Or, and
#Equals. While andBool, orBool, and ==K represent functions of sort
Bool, #And, #Or, and #Equals are matching logic connectives. We
will discuss matching logic in more detail later in the tutorial, but the basic
idea is that these symbols represent Boolean operators over the domain of
configurations and constraints, as opposed to over the Bool sort.

Notice that the configuration listed above is a disjunction of conjunctions.
This is the most common form of output that can be produced by the Haskell
Backend. In this case, each conjunction consists of a configuration and a set
of constraints. What this conjunction describes, essentially, is a
configuration and a set of information that was derived to be true while
rewriting that configuration.

Similar to how we saw --search in a previous lesson, the reason we have
multiple disjuncts is because there are multiple possible output states
for this program, depending on whether or not the second rule applied. In the
first case, we see that ?X is greater than or equal to 10, so the second rule
applied, rewriting the symbolic integer to the concrete integer 5. In the
second case, we see that the second rule did not apply because ?X is less
than 10. Moreover, because of the ensures clause on the first rule, we know
that ?X is not zero, therefore the first rule will not apply a second time.
If we had omitted this constraint, we would have ended up infinitely applying
the first rule, leading to krun not terminating.

In the next lesson, we will cover how symbolic execution forms the backbone
of deductive program verification in K and how we can use K to prove programs
correct against a specification.

Exercises

  1. Create another rule in LESSON-21 that rewrites odd integers greater than
    ten to a symbolic even integer less than 10 and greater than 0. This rule will
    now apply nondeterministically along with the existing rules. Predict what the
    resulting output configuration will be from rewriting 0 after adding this
    rule. Then run the program and see whether your prediction is correct.

Once you have completed the above exercises, you can continue to
Lesson 1.22: Basics of Deductive Program Verification using K.

K PL Tutorial

Here you will learn how to use the K tool to define languages by means of a series of screencast movies. It is recommended to do these in the indicated order, because K features already discussed in a previous language definition will likely not be rediscussed in latter definitions. The screencasts follow quite closely the structure of the files under the tutorial folder in the K tool distribution. If you'd rather follow the instructions there and do the tutorial exercises yourself, then go back to https://kframework.org and download the K tool, if you have not done it already. Or, you can first watch the screencasts below and then do the exercises, or do them in parallel.

K Overview

Make sure you watch the K overview video before you do the K tutorial:

Learning K

[34'46"] Part 1: Defining LAMBDA

Here you will learn how to define a very simple functional language in K and the basics of how to use the K tool. The language is a call-by-value variant of lambda calculus with builtins and mu, and its definition is based on substitution.

[37'07"] Part 2: Defining IMP

Here you will learn how to define a very simple, prototypical textbook C-like imperative language, called IMP, and several new features of the K tool.

[33'10"] Part 3: Defining LAMBDA++

Here you will learn how to define constructs which abruptly change the execution control, as well as how to define functional languages using environments and closures. LAMBDA++ extends the LAMBDA language above with a callcc construct.

[46'46"] Part 4: Defining IMP++

Here you will learn how to refine configurations, how to generate fresh elements, how to tag syntactic constructs and rules, how to exhaustively search the space of non-deterministic or concurrent program executions, etc. IMP++ extends the IMP language above with increment, blocks and locals, dynamic threads, input/output, and abrupt termination.

[17'03"] Part 5: Defining Type Systems

Here you will learn how to define various kinds of type systems following various approaches or styles using K.

[??'??"] Part 6: Miscellaneous Other K Features

Here you will learn a few other K features, and better understand how features that you have already seen work.

  • [??'??"] ...

Learning Language Design and Semantics using K

[??'??"] Part 7: SIMPLE: Designing Imperative Programming Languages

Here you will learn how to design imperative programming languages using K. SIMPLE is an imperative language with functions, threads, pointers, exceptions, multi-dimensional arrays, etc. We first define an untyped version of SIMPLE, then a typed version. For the typed version, we define both a static and a dynamic semantics.

[??'??"] Part 8: KOOL: Designing Object-Oriented Programming Languages

Here woul will learn how to design object-oriented programming languages using K. KOOL is an object-oriented language that extends SIMPLE with classes and objects. We first define an untyped version of KOOL, then a typed version, with both a dynamic and a static semantics.

[??'??"] Part 9: FUN: Designing Functional Programming Languages

H
ere woul will learn how to design functional programming languages using K. FUN is a higher-order functional language with general let, letrec, pattern matching, references, lists, callcc, etc. We first define an untyped version of FUN, then a let-polymorphic type inferencer.

[??'??"] Part 10: LOGIK: Designing Logic Programming Languages

Here you will learn how to design a logic programming language using K.

K overview

Go to Youtube mirror, if the above does not work.

Go back to https://kframework.org for further links, the K tool and contact information.

Learning K

We start by introducing the basic features of K by means of a series
of very simple languages. The objective here is neither to learn those
languages nor to study their underlying paradigm, but simply to learn K.

  • LAMBDA: Lambda calculus defined.
  • IMP: A simple imperative language.
  • LAMBDA++: LAMBDA extended with control flow.
  • IMP++: IMP extended with threads and IO.
  • TYPES: LAMBDA type system.

Part 1: Defining LAMBDA

Here you will learn how to define a very simple language in K and the basics
of how to use the K tool. The language is a variant of call-by-value lambda
calculus and its definition is based on substitution. Specifically, you will
learn the following:

  • How to define a module.
  • How to define a language syntax.
  • How to use the defined syntax to parse programs.
  • How to import predefined modules.
  • How to define evaluation strategies using strictness attributes.
  • How to define semantic rules.
  • How the predefined generic substitution works.
  • How to generate PDF and HTML documentation from ASCII definitions.
  • How to include builtins (integers and Booleans) into your language.
  • How to define derived language constructs.

This folder contains several lessons, each adding new features to LAMBDA.

Syntax Modules and Basic K Commands

Here we define our first K module, which contains the initial syntax of the
LAMBDA language, and learn how to use the basic K commands.

Let us create an empty working folder, and open a terminal window
(to the left) and an editor window (to the right). We will edit our K
definition in the right window in a file called lambda.k, and will call
the K tool commands in the left window.

Let us start by defining a K module, containing the syntax of LAMBDA.

K modules are introduced with the keywords module ... endmodule.

The keyword syntax adds new productions to the syntax grammar, using a
BNF-like notation.

Terminals are enclosed in double-quotes, like strings.

You can define multiple productions for the same non-terminal in the same
syntax declaration using the | separator.

Productions can have attributes, which are enclosed in square brackets.

The attribute left tells the parser that we want the lambda application to be
left associative. For example, a b c d will then parse as (((a b) c) d).

The attribute bracket tells the parser to not generate a node for the
parenthesis production in the abstract syntax trees associated to programs.
In other words, we want to allow parentheses to be used for grouping, but we
do not want to bother to give them their obvious (ignore) semantics.

In our variant of lambda calculus defined here, identifiers and lambda
abstractions are meant to be irreducible, that is, are meant to be values.
However, so far Val is just another non-terminal, just like Exp,
without any semantic meaning. It will get a semantic meaning later.

After we are done typing our definition in the file lambda.k, we can kompile
it with the command:

kompile lambda.k

If we get no errors then a parser has been generated. This parser will be
called from now on by default by the krun tool. To see whether and how the
parser works, we are going to write some LAMBDA programs and store them in
files with the extension .lambda.

Let us create a file identity.lambda, which contains the identity lambda
abstraction:

lambda x . x

Now let us call krun on identity.lambda:

krun identity.lambda

Make sure you call the krun command from the folder containing your language
definition (otherwise type krun --help to learn how to pass a language
definition as a parameter to krun). The krun command produces the output:

<k>
  lambda x . x
</k>

If you see such an output it means that your program has been parsed (and then
pretty printed) correctly. If you want to see the internal abstract syntax
tree (AST) representation of the parsed program, which we call the K AST, then
type kast in the command instead of krun:

kast identity.lambda

You should normally never need to see this internal representation in your
K definitions, so do not get scared (yes, it is ugly for humans, but it is
very convenient for tools).

Note that krun placed the program in a <k> ... </k> cell. In K, computations
happen only in cells. If you do not define a configuration in your definition,
like we did here, then a configuration will be created automatically for you
which contains only one cell, the default k cell, which holds the program.

Next, let us create a file free-variable-capture.lambda, which contains an
expression which, in order to execute correctly in a substitution-based
semantics of LAMBDA, the substitution operation needs to avoid
variable-capture:

a (((lambda x.lambda y.x) y) z)

Next, file closed-variable-capture.lambda shows an expression which also
requires a capture-free substitution, but this expression is closed (that is,
it has no free variables) and all its bound variables are distinct (I believe
this is the smallest such expression):

(lambda z.(z z)) (lambda x.lambda y.(x y))

Finally, the file omega.lambda contains the classic omega combinator
(or closed expression), which is the smallest expression which loops forever
(not now, but after we define the semantics of LAMBDA):

(lambda x.(x x)) (lambda x.(x x))

Feel free to define and parse several other LAMBDA programs to get a feel for
how the parser works. Parse also some incorrect programs, to see how the
parser generates error messages.

In the next lesson we will see how to define semantic rules that iteratively
rewrite expressions over the defined syntax until they evaluate to a result.
This way, we obtain our first programming language defined using K.

Go to Lesson 2, LAMBDA: Module Importing, Rules, Variables

MOVIE (out of date) [4'07"]

Module Importing, Rules, Variables

We here learn how to include a predefined module (SUBSTITUTION), how to
use it to define a K rule (the characteristic rule of lambda calculus),
and how to make proper use of variables in rules.

Let us continue our lambda.k definition started in the previous lesson.

The requires keyword takes a .k file containing language features that
are needed for the current definition, which can be found in the
k-distribution/include/kframework/builtin folder. Thus, the command

require "substitution.k"

says that the subsequent definition of LAMBDA needs the generic substitution,
which is predefined in file substitution.k under the folder
k-distribution/include/kframework/builtin. Note that substitution can be defined itself in K,
although it uses advanced features that we have not discussed yet in this
tutorial, so it may not be easy to understand now.

Using the imports keyword, we can now modify LAMBDA to import the module
SUBSTITUTION, which is defined in the required substitution.k file.

Now we have all the substitution machinery available for our definition.
However, since our substitution is generic, it cannot know which language
constructs bind variables, and what counts as a variable; however, this
information is critical in order to correctly solve the variable capture
problem. Thus, you have to tell the substitution that your lambda construct
is meant to be a binder, and that your Id terms should be treated as variables
for substitution. The former is done using the attribute binder.
By default, binder binds all the variables occurring anywhere in the first
argument of the corresponding syntactic construct within its other arguments;
you can configure which arguments are bound where, but that will be discussed
in subsequent lectures. To tell K which terms are meant to act as variables
for binding and substitution, we have to explicitly subsort the desired syntactic
categories to the builtin KVariable sort.

Now we are ready to define our first K rule. Rules are introduced with the
keyword rule and make use of the rewrite symbol, =>. In our case,
the rule defines the so-called lambda calculus beta-reduction, which
makes use of substitution in its right-hand side, as shown in lambda.k.

By convention, variables that appear in rules start with a capital letter
(the current implementation of the K tool may even enforce that).

Variables may be explicitly tagged with their syntactic category (also called
sort). If tagged, the matching term will be checked at run-time for
membership to the claimed sort. If not tagged, then no check will be made.
The former is safer, but involves the generation of a side condition to the
rule, so the resulting definition may execute slightly slower overall.

In our rule in lambda.k we tagged all variables with their sorts, so we chose
the safest path. Only the V variable really needs to be tagged there,
because we can prove (using other means, not the K tool, as the K tool is not
yet concerned with proving) that the first two variables will always have the
claimed sorts whenever we execute any expression that parses within our
original grammar.

Let us compile the definition and then run some programs. For example,

krun closed-variable-capture.lambda

yields the output

<k>
  lambda y . ((lambda x . (lambda y . (x  y))) y)
</k> 

Notice that only certain programs reduce (some even yield non-termination,
such as omega.lambda), while others do not. For example,
free-variable-capture.lambda does not reduce its second argument expression
to y, as we would expect. This is because the K rewrite rules between syntactic
terms do not apply anywhere they match. They only apply where they have been
given permission to apply by means of appropriate evaluation strategies of language
constructs, which is done using strictness attributes, evaluation contexts,
heating/cooling rules, etc., as discussed in the next lessons.

The next lesson will show how to add LAMBDA the desired evaluation strategies
using strictness attributes.

Go to Lesson 3, LAMBDA: Evaluation Strategies using Strictness

MOVIE (out of date) [4'03"]

Evaluation Strategies using Strictness

Here we learn how to use the K strict attribute to define desired evaluation
strategies. We will also learn how to tell K which terms are already
evaluated, so it does not attempt to evaluate them anymore and treats them
internally as results of computations.

Recall from the previous lecture that the LAMBDA program
free-variable-capture.lambda was stuck, because K was not given permission
to evaluate the arguments of the lambda application construct.

You can use the attribute strict to tell K that the corresponding construct
has a strict evaluation strategy, that is, that its arguments need to be
evaluated before the semantics of the construct applies. The order of
argument evaluation is purposely unspecified when using strict, and indeed
the K tool allows us to detect all possible non-deterministic behaviors that
result from such intended underspecification of evaluation strategies. We will
learn how to do that when we define the IMP language later in this tutorial;
we will also learn how to enforce a particular order of evaluation.

In order for the above strictness declaration to work effectively and
efficiently, we need to tell the K tool which expressions are meant to be
results of computations, so that it will not attempt to evaluate them anymore.
One way to do it is to make Val a syntactic subcategory of the builtin
KResult syntactic category. Since we use the same K parser to also parse
the semantics, we use the same syntax keyword to define additional syntax
needed exclusively for the semantics (like KResults). See lambda.k.

Compile again and then run some programs. They should all work as expected.
In particular, free-variable-capture.lambda now evaluates to a y.

We now got a complete and working semantic definition of call-by-value
lambda-calculus. While theoretically correct, our definition is not
easy to use and disseminate. In the next lessons we will learn how to
generate formatted documentation for LAMBDA and how to extend LAMBDA
in order to write human readable and interesting programs.

Go to Lesson 4, LAMBDA: Generating Documentation; Latex Attributes.

MOVIE (out of date) [2'20"]

Generating Documentation; Latex Attributes

In this lesson we learn how to generate formatted documentation from K
language definitions. We also learn how to use Latex attributes to control
the formatting of language constructs, particularly of ones which have a
mathematical flavor and we want to display accordingly.

To enhance readability, we may want to replace the keyword lambda by the
mathematical lambda symbol in the generated documentation. We can control
the way we display language constructs in the generated documentation
by associating them Latex attributes.

This is actually quite easy. All we have to do is to associate a latex
attribute to the production defining the construct in question, following
the Latex syntax for defining new commands (or macros).

In our case, we associate the attribute latex(\lambda{#1}.{#2}) to the
production declaring the lambda abstraction (recall that in Latex, #n refers
to the n-th argument of the defined new command).

We will later see, in Lesson 9, that we can add arbitrarily complex Latex
comments and headers to our language definitions, which give us maximum
flexibility in formatting our language definitions.

Now we have a simple programming language, with a nice documentation. However,
it is not easy to write interesting programs in this language. Almost all
programming languages build upon existing data-types and libraries. The K
tool provides a few of these (and you can add more).

In the next lesson we show how we can add builtin integers and Booleans to
LAMBDA, so we can start to evaluate meaningful expressions.

Go to Lesson 5, LAMBDA: Adding Builtins; Side Conditions.

MOVIE (out of date) [3'13"]

Adding Builtins; Side Conditions

We have already added the builtin identifiers (sort Id) to LAMBDA expressions,
but those had no operations on them. In this lesson we add integers and
Booleans to LAMBDA, and extend the builtin operations on them into
corresponding operations on LAMBDA expressions. We will also learn how to add
side conditions to rules, to limit the number of instances where they can
apply.

The K tool provides several builtins, which are automatically included in all
definitions. These can be used in the languages that we define, typically by
including them in the desired syntactic categories. You can also define your
own builtins in case the provided ones are not suitable for your language
(e.g., the provided builtin integers and operations on them are arbitrary
precision).

For example, to add integers and Booleans as values to our LAMBDA, we have to
add the productions

syntax Val ::= Int | Bool

Int and Bool are the nonterminals that correspond to these builtins.

To make use of these builtins, we have to add some arithmetic operation
constructs to our language. We prefer to use the conventional infix notation
for these, and the usual precedences (i.e., multiplication and division bind
tighter than addition, which binds tighter than relational operators).
Inspired from SDF, we use > instead of
| to state that all the previous constructs bind tighter than all the
subsequent ones. See lambda.k.

The only thing left is to link the LAMBDA arithmetic operations to the
corresponding builtin operations, when their arguments are evaluated.
This can be easily done using trivial rewrite rules, as shown in lambda.k.
In general, the K tool attempts to uniformly add the corresponding builtin
name as a suffix to all the operations over builtins. For example, the
addition over integers is an infix operation named +Int.

Compile the new lambda.k definition and evaluate some simple arithmetic
expressions. For example, if arithmetic.lambda is (1+2*3)/4 <= 1, then

krun arithmetic.lambda

yields, as expected, true. Note that the parser took the desired operation
precedence into account.

Let us now try to evaluate an expression which performs a wrong computation,
namely a division by zero. Consider the expression arithmetic-div-zero.lambda
which is 1/(2/3). Since division is strict and 2/3 evaluates to 0, this
expression reduces to 1/0, which further reduces to 1 /Int 0 by the rule for
division, which is now stuck (with the current back-end to the K tool).

In fact, depending upon the back-end that we use to execute K definitions and
in particular to evaluate expressions over builtins, 1 /Int 0 can evaluate to
anything. It just happens that the current back-end keeps it as an
irreducible term. Other K back-ends may reduce it to an explicit error
element, or issue a segmentation fault followed by a core dump, or throw an
exception, etc.

To avoid requesting the back-end to perform an illegal operation, we may use a
side condition in the rule of division, to make sure it only applies when the
denominator is non-zero.

Like in other operational formalisms, the role of the K side
conditions is to filter the number of instances of the rule. The notion
of a side condition comes from logics, where a sharp distinction is made
between a side condition (cheap) and a premise (expensive). Premises are
usually resolved using further (expensive) logical derivations, while side
conditions are simple (cheap) conditions over the rule meta-variables within
the underlying mathematical domains (which in K can be extended by the user,
as we will see in future lessons). Regarded as a logic, K derives rewrite
rules from other rewrite rules; therefore, the K side conditions cannot
contain other rewrites in them (using =>). This contrasts other rewrite
engines, for example Maude, which
allow conditional rules with rewrites in conditions.
The rationale behind this deliberate restriction in K is twofold:

  • On the one hand, general conditional rules require a complex, and thus slower
    rewrite engine, which starts recursive (sometimes exhaustive) rewrite sessions
    to resolve the rewrites in conditions. In contrast, the side conditions in K
    can be evaluated efficiently by back-ends, for example by evaluating builtin
    expressions and/or by calling builtin functions.
  • On the other hand, the semantic definitional philosophy of K is that rule
    premises are unnecessary, so there is no need to provide support for them.

Having builtin arithmetic is useful, but writing programs with just lambda
and arithmetic constructs is still a pain. In the next two lessons we will
add conditional (if_then_else) and binding (let and letrec) constructs,
which will allow us to write nicer programs.

Go to Lesson 6, LAMBDA: Selective Strictness; Anonymous Variables.

MOVIE (out of date) [4'52"]

Selective Strictness; Anonymous Variables

We here show how to define selective strictness of language constructs,
that is, how to state that certain language constructs are strict only
in some arguments. We also show how to use anonymous variables.

We next define a conditional if construct, which takes three arguments,
evaluates only the first one, and then reduces to either the second or the
third, depending on whether the first one evaluated to true or to false.

K allows to define selective strictness using the same strict attribute,
but passing it a list of numbers. The numbers correspond to the arguments
in which we want the defined construct to be strict. In our case,

syntax Exp ::= "if" Exp "then" Exp "else" Exp   [strict(1)]

states that the conditional construct is strict in the first argument.

We can now assume that its first argument will eventually reduce to a value, so
we only write the following two semantic rules:

rule if true  then E else _ => E
rule if false then _ else E => E

Thus, we assume that the first argument evaluates to either true or false.

Note the use of the anonymous variable _. We use such variables purely for
structural reasons, to state that something is there but we don't care what.
An anonymous variable is therefore completely equivalent to a normal variable
which is unsorted and different from all the other variables in the rule. If
you use _ multiple times in a rule, they will all be considered distinct.

Compile lambda.k and write and execute some interesting expressions making
use of the conditional construct. For example, the expression

if 2<=1 then 3/0 else 10

evaluates to 10 and will never evaluate 3/0, thus avoiding an unwanted
division-by-zero.

In the next lesson we will introduce two new language constructs, called
let and letrec and conventionally found in functional programming
languages, which will allow us to already write interesting LAMBDA programs.

Go to Lesson 7, LAMBDA: Derived Constructs; Extending Predefined Syntax.

MOVIE (out of date) [2'14"]

Derived Constructs, Extending Predefined Syntax

In this lesson we will learn how to define derived language constructs, that
is, ones whose semantics is defined completely in terms of other language
constructs. We will also learn how to add new constructs to predefined
syntactic categories.

When defining a language, we often want certain language constructs to be
defined in terms of other constructs. For example, a let-binding construct
of the form

let x = e in e'

is nothing but syntactic sugar for

(lambda x . e') e

This can be easily achieved with a rule, as shown in lambda.k.

As a side point, which is not very relevant here but good to know, we may
want the desugaring of let to not even count as a computational step, but
as a mere structural rearrangement of the program so that other semantic
rules (beta reduction, in our case) can match and apply.

The K tool allows us to tag rules with the attribute structural, with
precisely the intuition above. You can think of structural rules as a kind
of light rules, almost like macros, or like ones which apply under the hood,
instantaneously. There are several other uses for structural rules in K,
which we will discuss later in this tutorial.

Compile lambda.k and write some programs using let binders.

For example, consider a lets.lambda program which takes arithmetic.lambda
and replaces each integer by a let-bound variable. It should evaluate to
true, just like the original arithmetic.lambda.

Let us now consider a more interesting program, namely one that calculates the
factorial of 10:

let f = lambda x . (
        (lambda t . lambda x . (t t x))
        (lambda f . lambda x . (if x <= 1 then 1 else (x * (f f (x + -1)))))
        x
      )
in (f 10)

This program follows a common technique to define fixed points in untyped
lambda calculus, based on passing a function to itself.

We may not like to define fixed-points following the approach above, because
it requires global changes in the body of the function meant to be recursive,
basically to pass it to itself (f f in our case above). The approach below
isolates the fixed-point aspect of the function in a so-called fixed-point
combinator
, which we call fix below, and then apply it to the function
defining the body of the factorial, without any changes to it:

let fix = lambda f . (
          (lambda x . (f (lambda y . (x x y))))
          (lambda x . (f (lambda y . (x x y))))
        )
in let f = fix (lambda f . lambda x .
                (if x <= 1 then 1 else (x * (f (x + -1)))))
   in (f 10)

Although the above techniques are interesting and powerful (indeed, untyped
lambda calculus is in fact Turing complete), programmers will probably not
like to write programs this way.

We can easily define a more complex derived construct, called letrec and
conventionally encountered in functional programming languages, whose semantics
captures the fixed-point idea above. In order to keep its definition simple
and intuitive, we define a simplified variant of letrec, namely one which only
allows to define one recursive one-argument function. See lambda.k.

There are two interesting observations here.

First, note that we have already in-lined the definition of the fix
combinator in the definition of the factorial, to save one application of the
beta reduction rule (and the involved substitution steps). We could have
in-lined the definition of the remaining let, too, but we believe that the
current definition is easier to read.

Second, note that we extended the predefined Id syntactic category with two
new constants, $x and $y. The predefined identifiers cannot start with
$, so programs that will be executed with this semantics cannot possibly
contain the identifiers xandx andy. In other words, by adding them to Id they
become indirectly reserved for the semantics. This is indeed desirable,
because any possible uses of xinthebodyofthefunctiondefinedusingletrecwouldbecapturedbythelambdax in the body of the function defined using `letrec` would be captured by the `lambdaxdeclaration in the definition ofletrec`.

Using letrec, we can now write the factorial program as elegantly as it can
be written in a functional language:

letrec f x = if x <= 1 then 1 else (x * (f (x + -1)))
in (f 10)

In the next lesson we will discuss an alternative definition of letrec, based
on another binder, mu, specifically designed to define fixed points.

Go to Lesson 8, LAMBDA: Multiple Binding Constructs.

MOVIE (out of date) [5'10"]

Multiple Binding Constructs

Here we learn how multiple language constructs that bind variables can
coexist. We will also learn about or recall another famous binder besides
lambda, namely mu, which can be used to elegantly define all kinds of
interesting fixed-point constructs.

The mu binder has the same syntax as lambda, except that it replaces
lambda with mu.

Since mu is a binder, in order for substitution to know how to deal with
variable capture in the presence of mu, we have to tell it that mu is a
binding construct, same like lambda. We take advantage of being there and
also add mu its desired latex attribute.

The intuition for

mu x . e

is that it reduces to e, but each free occurrence of x in e behaves
like a pointer that points back to mu x . e.

With that in mind, let us postpone the definition of mu and instead redefine
letrec F X = E in E' as a derived construct, assuming mu available. The
idea is to simply regard F as a fixed-point of the function

lambda X . E

that is, to first calculate

mu F . lambda X . E

and then to evaluate E' where F is bound to this fixed-point:

let F = mu F . lambda X . E in E'

This new definition of letrec may still look a bit tricky, particularly
because F is bound twice, but it is much simpler and cleaner than our
previous definition. Moreover, now it is done in a type-safe manner
(this aspect goes beyond our objective in this tutorial).

Let us now define the semantic rule of mu.

The semantics of mu is actually disarmingly simple. We just have to
substitute mu X . E for each free occurrence of X in E:

mu X . E => E[(mu X . E) / X]

Compile lambda.k and execute some recursive programs. They should be now
several times faster. Write a few more recursive programs, for example ones
for calculating the Ackermann function, for calculating the number of moves
needed to solve the Hanoi tower problem, etc.

We have defined our first programming language in K, which allows us to
write interesting functional programs. In the next lesson we will learn how
to fully document our language definition, in order to disseminate it, to ship
it to colleagues or friends, to publish it, to teach it, and so on.

Go to Lesson 9, LAMBDA: A Complete and Commented Definition.

MOVIE (out of date) [2'40"]

A Complete and Documented K Definition

In this lesson you will learn how to add formal comments to your K definition,
in order to nicely document it. The generated document can be then used for
various purposes: to ease understanding the K definition, to publish it,
to send it to others, etc.

The K tool allows a literate programming style, where the executable
language definition can be documented by means of annotations. One such
annotation is the latex(_) annotation, where you can specify how to format
the given production when producing Latex output via the --output latex
option to krun, kast, and kprove.

There are three types of comments, which we discuss next.

Ordinary comments

These use // or /* ... */, like in various programming languages. These
comments are completely ignored.

Document annotations

Use the @ symbol right after // or /* in order for the comment to be
considered an annotation and thus be processed by the K tool when it
generates documentation.

As an example, we can go ahead and add such an annotation at the beginning
of the LAMBDA module, explaining how we define the syntax of this language.

Header annotations

Use the ! symbol right after // or /* if you want the comment to be
considered a header annotation, that is, one which goes before
\begin{document} in the generated Latex. You typically need header
annotations to include macros, or to define a title, etc.

As an example, let us set a Latex length and then add a title and an
author to this K definition.

Compile the documentation and take a look at the results. Notice the title.

Feel free to now add lots of annotations to lambda.k.

Then compile and check the result. Depending on your PDF viewer, you
may also see a nice click-able table of contents, with all the sections
of your document. This could be quite convenient when you define large
languages, because it helps you jump to any part of the semantics.

Tutorial 1 is now complete. The next tutorial will take us through the
definition of a simple imperative language and will expose us to more
feature of the K framework and the K tool.

MOVIE (out of date) [6'07"]

Part 2: Defining IMP

Here you will learn how to define a very simple imperative language in K
and the basics of how to work with configurations, cells, and computations.
Specifically, you will learn the following:

  • How to define languages using multiple modules.
  • How to define sequentially strict syntactic constructs.
  • How to use K's syntactic lists.
  • How to define, initialize and configure configurations.
  • How the language syntax is swallowed by the builtin K syntactic category.
  • The additional syntax of the K syntactic category.
  • How the strictness annotations are automatically desugared into rules.
  • The first steps of the configuration abstraction mechanism.
  • The distinction between structural and computational rules.

Like in the previous tutorial, this folder contains several lessons, each
adding new features to IMP. Do them in order. Also, make sure you completed
and understood the previous tutorial.

Defining a More Complex Syntax

Here we learn how to define a more complex language syntax than LAMBDA's,
namely the C-like syntax of IMP. Also, we will learn how to define languages
using multiple modules, because we are going to separate IMP's syntax from
its semantics using modules. Finally, we will also learn how to use K's
builtin support for syntactic lists.

The K tool provides modules for grouping language features. In general, we
can organize our languages in arbitrarily complex module structures.
While there are no rigid requirements or even guidelines for how to group
language features in modules, we often separate the language syntax from the
language semantics in different modules.

In our case here, we start by defining two modules, IMP-SYNTAX and IMP, and
import the first in the second, using the keyword imports. As their names
suggest, we will place all IMP's syntax definition in IMP-SYNTAX and all its
semantics in IMP.

Note, however, that K does no more than simply includes all the
contents of the imported module in the one which imports it (making sure
that everything is only kept once, even if you import it multiple times).
In other words, there is currently nothing fancy in K tool's module system.

IMP has six syntactic categories, as shown in imp.k: AExp for arithmetic
expressions, BExp for Boolean expressions, Block for blocks, Stmt for
statements, Pgm for programs and Ids for comma-separated lists of
identifiers. Blocks are special statements, whose role is to syntactically
constrain the conditional statement and the while loop statement to only
take blocks as branches and body, respectively.

There is nothing special about arithmetic and Boolean expressions. They
are given the expected strictness attributes, except for <= and &&,
for demonstration purposes.

The <= is defined to be seqstrict, which means that it evaluates its
arguments in order, from left-to-right (recall that the strict operators
can evaluate their arguments in any, fully interleaved, orders). Like
strict, the seqstrict annotation can also be configured; for example, one
can specify in which arguments and in what order. By default, seqstrict
refers to all the arguments, in their left-to-right order. In our case here,
it is equivalent with seqstrict(1 2).

The && is only strict in its first argument, because we will give it a
short-circuited semantics (its second argument will only be evaluated when
the first evaluates to true). Recall the K tool also allows us to associate
LaTex attributes to constructs, telling the document generator how to display
them. For example, we associate <= the attribute latex({#1}\leq{#2}),
which makes it be displayed \leq everywhere in the generated LaTex
documentation.

In this tutorial we take the freedom to associate the various constructs
parsing precedences that we have already tested and we know work well, so that
we can focus on the semantics here instead of syntax. In practice, though,
you typically need to experiment with precedences until you obtain the desired
parser.

Blocks are defined using curly brackets, and they can either be empty or
hold a statement.

Nothing special about the IMP statements. Note that ; is an assignment
statement terminator, not a statement separator. Note also that blocks are
special statements.

An IMP program declares a comma-separated list of variables using the keyword
int like in C, followed by a semicolon ;, followed by a statement.
Syntactically, the idea here is that we can wrap any IMP program within a
main(){...} function and get a valid C program. IMP does not allow variable
declarations anywhere else except through this construct, at the top-level of
the program. Other languages provided with the K distribution (see, e.g., the
IMP++ language also discussed in this tutorial) remove this top-level program
construct of IMP and add instead variable declaration as a statement construct,
which can be used anywhere in the program, not only at the top level.

Note how we defined the comma-separated list of identifiers using
List{Id,","}. The K tool provides builtin support for generic syntactic
lists. In general,

syntax B ::= List{A,T}

declares a new non-terminal, B, corresponding to T-separated sequences of
elements of A, where A is a non-terminal and T is a terminal. These
lists can also be empty, that is, IMP programs declaring no variable are also
allowed (e.g., int; {} is a valid IMP program). To instantiate and use
the K builtin lists, you should alias each instance with a (typically fresh)
non-terminal in your syntax, like we do with the Ids nonterminal.

Like with other K features, there are ways to configure the syntactic lists,
but we do not discuss them here.

Recall from Tutorial 1 (LAMBDA) that in order for strictness to work well
we also need to tell K which computations are meant to be results. We do
this as well now, in the module IMP: integers and Booleans are K results.

Kompile imp.k and test the generated parser by running some programs.
Since IMP is a fragment of C, you may want to select the C mode in your
editor when writing these programs. This will also give your the feel that
you are writing programs in a real programming language.

For example, here is sum.imp, which sums in sum all numbers up to n:

int n, sum;
n = 100;
sum=0;
while (!(n <= 0)) {
  sum = sum + n;
  n = n + -1;
}

Now krun it and see how it looks parsed in the default k cell.

The program collatz.imp tests the Collatz conjecture for all numbers up to
m and accumulates the total number of steps in s:

int m, n, q, r, s;
m = 10;
while (!(m<=2)) {
  n = m;
  m = m + -1;
  while (!(n<=1)) {
    s = s+1;
    q = n/2;
    r = q+q+1;
    if (r<=n) {
      n = n+n+n+1;         // n becomes 3*n+1 if odd
    } else {n=q;}          //        of   n/2 if even
  }
}

Finally, program primes.imp counts in s all the prime numbers up to m:

int i, m, n, q, r, s, t, x, y, z;
m = 10;  n = 2;
while (n <= m) {
  // checking primality of n and writing t to 1 or 0
  i = 2;  q = n/i;  t = 1;
  while (i<=q && 1<=t) {
    x = i;
    y = q;
    // fast multiplication (base 2) algorithm
    z = 0;
    while (!(x <= 0)) {
      q = x/2;
      r = q+q+1;
      if (r <= x) { z = z+y; } else {}
      x = q;
      y = y+y;
    } // end fast multiplication
    if (n <= z) { t = 0; } else { i = i+1;  q = n/i; }
  } // end checking primality
  if (1 <= t) { s = s+1; } else {}
  n = n+1;
}

All the programs above will run once we define the semantics of IMP. If you
want to execute them now, wrap them in a main(){...} function and compile
them and run them with your favorite C compiler.

Before we move to the K semantics of IMP, we would like to make some
clarifications regarding the K builtin parser, kast. Although it is quite
powerful, you should not expect magic from it! While the K parser can parse
many non-trivial languages (see, for example, the KOOL language in
pl-tutorial/2_languages) in the K distribution), it was
never meant to be a substitute for real parsers. We often call the syntax
defined in K the syntax of the semantics, to highlight the fact that its
role is to serve as a convenient notation when writing the semantics, not
necessarily as a means to define concrete syntax of arbitrarily complex
programming languages. See the KERNELC language for an example on how to connect an external parser for concrete syntax to
the K tool.

The above being said, we strongly encourage you to strive to make the
builtin parser work with your desired language syntax! Do not give up
simply because you don't want to deal with syntactic problems. On the
contrary, fight for your syntax! If you really cannot define your desired
syntax because of tool limitations, we would like to know. Please tell us.

Until now we have only seen default configurations. In the next lesson we
will learn how to define a K custom configuration.

Go to Lesson 2, IMP: Defining a Configuration.

MOVIE (out of date) [09'15"]

Defining a Configuration

Here we learn how to define a configuration in K. We also learn how to
initialize and how to display it.

As explained in the overview presentation on K, configurations are quite
important, because all semantic rules match and apply on them.
Moreover, they are the backbone of configuration abstraction, which allows
you to only mention the relevant cells in each semantic rule, the rest of
the configuration context being inferred automatically. The importance of
configuration abstraction will become clear when we define more complex
languages (even in IMP++). IMP does not really need it. K configurations
are constructed making use of cells, which are labeled and can be arbitrarily
nested.

Configurations are defined with the keyword configuration. Cells are
defined using an XML-ish notation stating clearly where the cell starts
and where it ends.

While not enforced by the tool, we typically like to put the entire
configuration in a top-level cell, called T. So let's define it:

configuration <T>...</T>

Cells can have other cells inside. In our case of IMP, we need a cell to
hold the remaining program, cell which we typically call k, and a cell to
hold the program state. Let us add them:

configuration <T> <k>...</k> <state>...</state> </T>

K allows us to also specify how to initialize a configuration at the same
time with declaring the configuration. All we have to do is to fill in
the contents of the cells with some terms. The syntactic categories of
those terms will also indirectly define the types of the corresponding
cells.

For example, we want the k cell to initially hold the program that is passed
to krun. K provides a builtin configuration variable, called $PGM, which
is specifically designed for this purpose: krun will place its program there
(after it parses it, or course). The K tool allows users to define their own
configuration variables, too, which can be used to develop custom
initializations of program configurations with the help of krun; this can be
quite useful when defining complex languages, but we do not discuss it in
this tutorial.

configuration <T> <k> $PGM </k> <state>...</state>  </T>

Moreover, we want the program to be a proper Pgm term (because we do not
want to allow krun to take fragments of programs, for example, statements).
Therefore, we tag $PGM with the desired syntactic category, Pgm:

configuration <T> <k> $PGM:Pgm </k> <state>...</state>  </T>

Like for other variable tags in K, a run-time check will be performed and the
semantics will get stuck if the passed term is not a well-formed program.

We next tell K that the state cell should be initialized with the empty map:

configuration <T> <k> $PGM:Pgm </k> <state> .Map </state>  </T>

Recall that in K . stands for nothing. However, since there are various
types of nothing, to avoid confusion we can suffix the . with its desired
type. K has several builtin data-types, including lists, sets, bags, and
maps. .Map is the empty map.

Kompile imp.k and run several programs to see how the configuration is
initialized as desired.

When configurations get large, and they do when defining large programming
languages, you may want to color the cells in order to more easily distinguish
them. This can be easily achieved using the color cell attribute, following
again an XML-ish style:

configuration <T color="yellow">
                <k color="green"> $PGM:Pgm </k>
                <state color="red"> .Map </state>
              </T>

In the next lesson we will learn how to write rules that involve cells.

Go to Lesson 3, IMP: Computations, Results, Strictness; Rules Involving Cells.

MOVIE (out of date) [04'21"]

Computations, Results, Strictness; Rules Involving Cells

In this lesson we will learn about the syntactic category K of computations,
about how strictness attributes are in fact syntactic sugar for rewrite rules
over computations, and why it is important to tell the tool which
computations are results. We will also see a K rule that involves cells.

K Computations

Computation structures, or more simply computations, extend the abstract
syntax of your language with a list structure using ~> (read followed
by
or and then, and written \curvearrowright in Latex) as a separator.
K provides a distinguished sort, K, for computations. The extension of the
abstract syntax of your language into computations is done automatically by
the K tool when you declare constructs using the syntax keyword, so the K
semantic rules can uniformly operate only on terms of sort K. The intuition
for computation structures of the form

t1 ~> t2 ~> ... ~> tn

is that the listed tasks are to be processed in order. The initial
computation typically contains the original program as its sole task, but
rules can then modify it into task sequences, as seen shortly.

Strictness in Theory

The strictness attributes, used as annotations to language constructs,
actually correspond to rules over computations. For example, the
strict(2) attribute of the assignment statement corresponds to the
following two opposite rules (X ranges over Id and A over AExp):

X=A; => A ~> X=[];
A ~> X=[]; => X=A;

The first rule pulls A from the syntactic context X=A; and schedules it
for processing. The second rule plugs A back into its context.
Inspired from the chemical abstract machine, we call rules of the first
type above heating rules and rules of the second type cooling rules.
Similar rules are generated for other arguments in which operations are
strict. Iterative applications of heating rules eventually bring to the
top of the computation atomic tasks, such as a variable lookup, or a
builtin operation, which then make computational progress by means of other
rules. Once progress is made, cooling rules can iteratively plug the result
back into context, so that heating rules can pick another candidate for
reduction, and so on and so forth.

When operations are strict only in some of their arguments, the corresponding
positions of the arguments in which they are strict are explicitly enumerated
in the argument of the strict attribute, e.g., strict(2) like above, or
strict(2 3) for an operation strict in its second and third arguments, etc.
If an operation is simply declared strict then it means that it is strict
in all its arguments. For example, the strictness of addition yields:

A1+A2 => A1 ~> []+A2
A1 ~> []+A2 => A1+A2
A1+A2 => A2 ~> A1+[]
A2 ~> A1+[] => A1+A2

It can be seen that such heating/cooling rules can easily lead to
non-determinism, since the same term may be heated many different ways;
these different evaluation orders may lead to different behaviors in some
languages (not in IMP, because its expressions do not have side effects,
but we will experiment with non-determinism in its successor, IMP++).

A similar desugaring applies to sequential strictness, declared with the
keyword seqstrict. While the order of arguments of strict is irrelevant,
it matters in the case of seqstrict: they are to be evaluated in the
specified order; if no arguments are given, then they are assumed by default
to be evaluated from left-to-right. For example, the default heating/cooling
rules associated to the sequentially strict <= construct above are
(A1, A2 range over AExp and I1 over Int):

A1<=A2 => A1 ~> []<=A2
A1 ~> []<=A2 => A1<=A2
I1<=A2 => A2 ~> I1<=[]
A2 ~> I1<=[] => I1<=A2

In other words, A2 is only heated/cooled after A1 is already evaluated.

While the heating/cooling rules give us a nice and uniform means to define
all the various allowable ways in which a program can evaluate, all based
on rewriting, the fact that they are reversible comes with a serious practical
problem: they make the K definitions unexecutable, because they lead to
non-termination.

Strictness in Practice; K Results

To break the reversibility of the theoretical heating/cooling rules, and,
moreover, to efficiently execute K definitions, the current implementation of
the K tool relies on users giving explicit definitions of their languages'
results.

The K tool provides a predicate isKResult, which is automatically defined
as we add syntactic constructs to KResult (in fact the K tool defines such
predicates for all syntactic categories, which are used, for example, as
rule side conditions to check user-declared variable memberships, such as
V:Val stating that V belongs to Val).

The kompile tool, depending upon what it is requested to do, changes the
reversible heating/cooling rules corresponding to evaluation strategy
definitions (e.g., those corresponding to strictness attributes) to avoid
non-termination. For example, when one is interested in obtaining an
executable model of the language (which is the default compilation mode of
kompile), then heating is performed only when the to-be-pulled syntactic
fragment is not a result, and the corresponding cooling only when the
to-be-plugged fragment is a result. In this case, e.g., the heating/cooling
rules for assignment are modified as follows:

X=A; => A ~> X=[];  requires notBool isKResult(A)
A ~> X=[]; => X=A;  requires isKResult(A)

Note that non-termination of heating/cooling is avoided now. The only thing
lost is the number of possible behaviors that a program can manifest, but
this is irrelevant when all we want is one behavior.

As will be discussed in the IMP++ tutorial, the heating/cooling rules are
modified differently by kompile when we are interested in other aspects
of the language definition, such us, for example, in a search-able model that
comprises all program behaviors. This latter model is obviously more general
from a theoretical perspective, but, in practice, it is also slower to execute.
The kompile tool strives to give you the best model of the language for the
task you are interested in.

Can't Results be Inferred Automatically?

This is a long story, but the short answer is: No!. Maybe in some cases
it is possible, but we prefer to not attempt it in the K tool. For example,
you most likely do not want any stuck computation to count as a result,
since some of them can happen simply because you forgot a semantic rule that
could have further reduce it! Besides, in our experience with defining large
languages, it is quite useful to take your time and think of what the results
of your language's computations are. This fact in itself may help you improve
your overall language design. We typically do it at the same time with
defining the evaluation strategies of our languages. Although in theory K
could infer the results of your language as the stuck computations, based on
the above we have deliberately decided to not provide this feature, in spite
of requests from some users. So you currently do have to explicitly define
your K results if you want to effectively use the K tool. Note, however, that
theoretical definitions, not meant to be executed, need not worry about
defining results (that's because in theory semantic rules apply modulo the
reversible heating/cooling rules, so results are not necessary).

A K Rule Involving Cells

All our K rules so far in the tutorial were of the form

rule left => right requires condition

where left and right were syntactic, or more generally computation, terms.

Here is our first K rule explicitly involving cells:

rule <k> X:Id => I ...</k> <state>... X |-> I ...</state>

Recall that the k cell holds computations, which are sequences of tasks
separated by ~>. Also, the state cell holds a map, which is a set of
bindings, each binding being a pair of computations (currently, the
K builtin data-structures, like maps, are untyped; or, said differently,
they are all over the type of computations, K).

Therefore, the two cells mentioned in the rule above hold collections
of things, ordered or not. The ...s, which we also call cell frames,
stand for more stuff there, which we do not care about.

The rewrite relation => is allowed in K to appear anywhere in a term, its
meaning being that the corresponding subterm is rewritten as indicated in the
shown context. We say that K's rewriting is local.

The rule above says that if the identifier X is the first task in the k
cell, and if X is bound to I somewhere in the state, then X rewrites
to I locally in the k cell. Therefore, IMP variables need to be already
declared when looked up.

Of course, the K rule above can be translated into an ordinary rewrite rule
of the form

rule <k> X ~> Rest </k> <state> Before (X |-> I) After </state>
  => <k> I ~> Rest </k> <state> Before (X |-> I) After </state>

Besides being more verbose and thus tedious to write, this ordinary rule
is also more error-prone; for example, we may forget the Rest variable
in the right-hand-side, etc. Moreover, the concurrent semantics of K
allows for its rules to be interpreted as concurrent transactions, where
the context is the read-only component of the transaction, while the
subterms which are rewritten are read/write component of the transaction;
thus, K rule instances can apply concurrently if they only overlap
on read-only parts, while they cannot if regarded as ordinary rewrite logic
rules. Note: our current implementation of the K tool is not concurrent,
so K rules are in fact desugared as normal rewrite rules in the K tool.

Kompile imp.k using a documentation option and check out how the K rule
looks in the generated document. The ... frames are displayed as cell
tears, metaphorically implying that those parts of the cells that we
do not care about are torn away. The rewrite relation is replaced by a
horizontal line: specifically, the subterm which rewrites, X, is
underlined, and its replacement is written underneath the line.

In the next lesson we define the complete K semantics of IMP and
run the programs we parsed in the first lesson.

Go to Lesson 4, IMP: Configuration Abstraction, Part 1; Types of Rules.

MOVIE (out of date) [10'30"]

Configuration Abstraction, Part 1; Types of Rules

Here we will complete the K definition of IMP and, while doing so, we will
learn the very first step of what we call configuration abstraction, and
the semantic distinction between structural and computational rules.

The IMP Semantic Rules

Let us add the remaining rules, in the order in which the language constructs
were defined in IMP-SYNTAX.

The rules for the arithmetic and Boolean constructs are self-explanatory.
Note, however, that K will infer the correct sorts of all the variables in
these rules, because they appear as arguments of the builtin operations
(_+Int_, etc.). Moreover, the inferred sorts will be enforced dynamically.
Indeed, we do not want to apply the rule for addition, for example, when the
two arguments are not integers. In the rules for &&, although we prefer to
not do it here for simplicity, we could have eliminated the dynamic check by
replacing B (and similarly for _) with B:K. Indeed, it can be shown
that whenever any of these rules apply, B (or _) is a BExp anyway.
That's because there is no rule that can touch such a B (or _); this
will become clearer shortly, when we discuss the first step of configuration
abstraction. Therefore, since we know that B will be a BExp anyway, we
could save the time it takes to check its sort; such times may look minor,
but they accumulate, so some designers may prefer to avoid run-time checks
whenever possible.

The block rules are trivial. However, the rule for non-empty blocks is
semantically correct only because we do not have local variable declarations
in IMP. We will have to change this rule in IMP++.

The assignment rule has two =>: one in the k cell dissolving the
assignment statement, and the other in the state cell updating the value of
the assigned variable. Note that the one in the state is surrounded by
parentheses: (_ => I). That is because => is greedy: it matches as much
as it can to the left and to the right, until it reaches the cell boundaries
(closed or open). If you want to limit its scope, or for clarity, you can use
parentheses like here.

The rule for sequential composition simply desugars S1 S2 into S1 ~> S2.
Indeed, the two have exactly the same semantics. Note that statements
evaluate to nothing (.), so once S1 is processed in S1 ~> S2, then the
next task is automatically S2, without wasting any step for the transition.

The rules for the conditional and while statements are clear. One thing to
keep in mind now is that the while unrolling rule will not apply
indefinitely in the positive branch of the resulting conditional, because
of K's configuration abstraction, which will be discussed shortly.

An IMP program declares a set of variables and then executes a
statement in the state obtained after initializing all those variables
to 0. The rules for programs initialize the declared variables one by one,
checking also that there are no duplicates. We check for duplicates only for
demonstration purposes, to illustrate the keys predefined operation that
returns the set of keys of a map, and the set membership operation in.
In practice, we typically define a static type checker for our language,
which we execute before the semantics and reject inappropriate programs.

The use of the .Ids in the second rule is not necessary. We could have
written int; S instead of int .Ids; S and the K tool would parse it and
kompile the definition correctly, because it uses the same parser used for
parsing programs also to parse the semantics. However, we typically prefer to
explicitly write the nothing values in the semantics, for clarity;
the parser has been extended to accept these. Note that the first rule
matches the entire k cell, because int_;_ is the top-level program
construct in IMP, so there is nothing following it in the computation cell.
The anonymous variable stands for the second argument of this top-level program
construct, not for the rest of the computation. The second rule could have
also been put in a complete k cell, but we preferred not to, for simplicity.

Our IMP semantics is now complete, but there are a few more things that we
need to understand and do.

Configuration Abstraction, Part 1

First, let us briefly discuss the very first step of configuration abstraction.
In K, all semantic rules are in fact rules between configurations. As soon
explained in the IMP++ tutorial, the declared configuration cell structure is
used to automatically complete the missing configuration parts in rules.
However, many rules do not involve any cells, being rules between syntactic
terms (of sort K); for example, we had only three rules involving cells in our
IMP semantics. In this case, the k cell will be added automatically and the
actual rewrite will happen on top of the enclosed computation. For example,
the rule for the while loop is automatically translated into the following:

rule <k> while (B) S => if (B) {S while (B) S} else {} ...</k>

Since the first task in computations is what needs to be done next, the
intuition for this rule completion is that the syntactic transition
only happens when the term to rewrite is ready for processing. This explains,
for example, why the while loop unrolling does not indefinitely apply in the
positive branch of the conditional: the inner while loop is not ready for
evaluation yet. We call this rule completion process, as well as other
similar ones, configuration abstraction. That is because the incomplete
rule abstracts away the configuration structure, thus being easier to read.
As seen soon when we define IMP++, configuration abstraction is not only a
user convenience; it actually significantly increases the modularity of our
definitions. The k-cell-completion is only the very first step, though.

If you really want certain rewrites over syntactic terms to apply
anywhere they match, then you should tag the rule with the attribute
anywhere, which was discussed in Tutorial 1, Lesson 2.5.

Structural vs. Computational Rules

The K rules are of two types: structural and computational. Intuitively,
structural rules rearrange the configuration so that computational rules can
apply. Structural rules therefore do not count as computational steps. A K
semantics can be thought of as a generator of transition systems, one for each
program. It is only the computational rules that create steps, or transitions,
in the corresponding transition system, the structural rules being unobservable
at this level. By default, rules are all assumed computational, except for
the implicit heating/cooling rules that define evaluation strategies of
language constructs, which are assumed structural. If you want to explicitly
make a rule structural, then you should include the tag (or attribute)
structural in square brackets right after the rule. These attributes may be
taken into account by different K tools, so it is highly recommended to spend
a moment or two after each rule and think whether you want it to be structural
or computational.

Let us do it. We want the lookup and the arithmetic and Boolean construct
rules to be computational, because they make computational progress whenever
they apply. However, the block rules can be very well structural, because
we can regard them simply as syntactic grouping constructs. In general,
we want to have as few computational rules as possible, because we want
the resulting transition systems to be smaller for analysis purposes, but not
too few to lose behaviors. For example, making the block rules structural
loses no meaningful behaviors. Similarly, the sequential composition,
the while loop unrolling, and the no-variable declaration rules can all
safely be structural.

Kompile and then krun the programs that you only parsed in Lesson 1. They
should all execute as expected. The state cell shows the final state
of the program. The k cell shows the final code contents, which should be
empty whenever the IMP program executes correctly.

Kompile also with the documentation option and take a look at the generated
documentation. The assignment rule should particularly be of interest,
because it contains two local rewrites.

In the next lesson we comment the IMP definition and conclude this tutorial.

Go to Lesson 5, IMP: Completing and Documenting IMP.

MOVIE (out of date) [09'16"]

Completing and Documenting IMP

We here learn no new concepts, but it is a good moment to take a break
and contemplate what we learned so far.

Let us add lots of formal annotations to imp.k.

Once we are done with the annotations, we kompile with the documentation
option and then take a look at the produced document. We often call these
documents language posters. Depending on how much information you add to
these language posters, they can serve as standalone, formal presentations
of your languages. For example, you can print them as large posters and
post them on the wall, or in poster sessions at conferences.

This completes our second tutorial. The next tutorials will teach us more
features of the K framework, such as how to define languages with complex
control constructs (like callcc), languages which are concurrent, and so on.

MOVIE (out of date) [03'45"]

Part 3: Defining LAMBDA++

Here you will learn how to define language constructs which abruptly change
the execution control flow, and how to define language semantics following
and environment/store style. Specifically, you will learn the following:

  • How to define constructs like callcc, which allow you to take snapshots of
    program executions and to go back in time at any moment.
  • How to define languages in an environment/store style.
  • Some basic notions about the use of closures and closure-like semantic
    structures to save and restore execution environments.
  • Some basic intuitions about reusing existing semantics in new languages,
    as well as some of the pitfalls in doing so.

Abrupt Changes of Control

Here we add call-with-current-continuation (callcc) to the definition of
LAMBDA completed in Tutorial 1, and call the resulting language LAMBDA++.
While doing so, we will learn how to define language constructs that
abruptly change the execution control flow.

Take over the lambda.k definition from Lesson 8 in Part 1 of this Tutorial,
which is the complete definition of the LAMBDA language, but without the
comments.

callcc is a good example for studying the capabilities of a framework to
support abrupt changes of control, because it is one of the most
control-intensive language constructs known. Scheme is probably the first
programming language that incorporated the callcc construct, although
similar constructs have been recently included in many other languages in
one form or another.

Here is a quick description: callcc e passes the remaining computation
context, packaged as a function k, to e (which is expected to be a function);
if during its evaluation e passes any value to k, then the current
execution context is discarded and replaced by the one encoded by k and
the value is passed to it; if e evaluates normally to some value v and
passes nothing to k in the process, then v is returned as a result of
callcc e and the execution continues normally. For example, we want the
program callcc-jump.lambda:

(callcc (lambda k . ((k 5) + 2))) + 10

to evaluate to 15, not 17! Indeed, the computation context [] + 10 is
passed to callcc's argument, which then sends it a 5, so the computation
resumes to 5 + 10. On the other hand, the program callcc-not-jump.lambda

(callcc (lambda k . (5 + 2))) + 10

evaluates to 17.

If you like playing games, you can metaphorically think of callcc e as
saving your game state in a file and passing it to your friend e.
Then e can decide at some moment to drop everything she was doing, load
your game and continue to play it from where you were.

The behavior of many popular control-changing constructs can be obtained
using callcc. The program callcc-return.lambda shows, for example, how to
obtain the behavior of a return statement, which exits the current execution
context inside a function and returns a value to the caller's context:

letrec f x = callcc (lambda return . (
  f (if (x <= 0) then ((return 1) / 0) else 2)
))
in (f -3)

This should evaluate to 1, in spite of the recursive call to f
and of the division by zero! Note that return is nothing but a variable
name, but one which is bound to the current continuation at the beginning of
the function execution. As soon as 1 is passed to return, the computation
jumps back in time to where callcc was defined! Change -3 to 3 and the
program will loop forever.

callcc is quite a powerful and beautiful language construct, although one
which is admittedly hard to give semantics to in some frameworks.
But not in K 😃 Here is the entire K syntax and semantics of callcc:

syntax Exp ::= "callcc" Exp  [strict]
syntax Val ::= cc(K)
rule <k> (callcc V:Val => V cc(K)) ~> K </k>
rule <k> cc(K) V ~> _ =>  V ~> K </k>

Let us first discuss the annotated syntax. We declared callcc strict,
because its argument may not necessarily be a function yet, so it may need
to be evaluated. As explained above, we need to encode the remaining
computation somehow and pass it to callcc's argument. More specifically,
since LAMBDA is call-by-value, we have to encode the remaining computation as
a value. We do not want to simply subsort computations to Val, because there
are computations which we do not want to be values. A simple solution to
achieve our goal here is to introduce a new value construct, say cc (from
current-continuation), which holds any computation.

Note that, inspired from SDF,
K allows you to define the syntax of helping semantic operations, like cc,
more compactly. Typically, we do not need a fancy syntax for such operators;
all we need is a name, followed by open parenthesis, followed by a
comma-separated list of arguments, followed by closed parenthesis. If this
is the syntax that you want for a particular construct, then K allows you to
drop all the quotes surrounding the terminals, as we did above for cc.

The semantic rules do exactly what the English semantics of callcc says.
Note that here, unlike in our definition of LAMBDA in Tutorial 1, we had
to mention the cell <k/> in our rules. This is because we need to make sure
that we match the entire remaining computation, not only a fragment of it!
For example, if we replace the two rules above with

rule (callcc V:Val => V cc(K)) ~> K
rule cc(K) V ~> _ =>  V ~> K

then we get a callcc which is allowed to non-deterministically pick a
prefix of the remaining computation and pass it to its argument, and then
when invoked within its argument, a non-deterministic prefix of the new
computation is discarded and replaced by the saved one. Wow, that would
be quite a language! Would you like to write programs in it? 😃

Consequently, in K we can abruptly change the execution control flow of a
program by simply changing the contents of the <k/> cell. This is one of
the advantages of having an explicit representation of the execution context,
like in K or in reduction semantics with evaluation contexts. Constructs like
callcc are very hard and non-elegant to define in frameworks such as SOS,
because those implicitly represent the execution context as proof context,
and the latter cannot be easily changed.

Now that we know how to handle cells in configurations and use them in rules,
in the next lesson we take a fresh look at LAMBDA and define it using
an environment-based style, which avoids the complexity of substitution
(e.g., having to deal with variable capture) and is closer in spirit to how
functional languages are implemented.

Go to Lesson 2, LAMBDA++: Semantic (Non-Syntactic) Computation Items.

MOVIE (out of date) [6'28"]

Semantic (Non-Syntactic) Computation Items

In this lesson we start another semantic definition of LAMBDA++, which
follows a style based on environments instead of substitution. In terms of
K, we will learn how easy it is to add new items to the syntactic category
of computations K, even ones which do not have a syntactic nature.

An environment binds variable names of interest to locations where their
values are stored. The idea of environment-based definitions is to maintain
a global store mapping locations to values, and then have environments
available when we evaluate expressions telling where the variables are
located in the store. Since LAMBDA++ is a relatively simple language, we
only need to maintain one global environment. Following a similar style
like in IMP, we place all cells into a top cell T:

configuration <T>
                <k> $PGM:Exp </k>
                <env> .Map </env>
                <store> .Map </store>
              </T>

Recall that $PGM is where the program is placed by krun after parsing. So
the program execution starts with an empty environment and an empty store.

In environment-based definitions of lambda-calculi, lambda abstractions
evaluate to so-called closures:

rule <k> lambda X:Id . E => closure(Rho,X,E) ...</k>
     <env> Rho </env>

A closure is like a lambda abstraction, but it also holds the environment
in which it was declared. This way, when invoked, a closure knows where to
find in the store the values of all the variables that its body expression
refers to. We will define the lookup rule shortly.

Therefore, unlike in the substitution-based definitions of LAMBDA and
LAMBDA++, neither the lambda abstractions nor the identifiers are values
anymore here, because they both evaluate further: lambda abstractions to
closures and identifiers to their values in the store. In fact, the only
values at this moment are the closures, and they are purely semantic entities,
which cannot be used explicitly in programs. That's why we modified the
original syntax of the language to include no Val syntactic category
anymore, and that's why we need to add closures as values now; same like
before, we add a Val syntactic category which is subsorted
to KResult. In general, whenever you have any strictness attributes,
your should also define some K results.

Invoking a closure is a bit more involved than the substitution-based
beta-reduction: we need to switch to the closure's environment, then create a
new, or fresh, binding for the closure's parameter to the value passed to the
closure, then evaluate the closure's body, and then switch back to the
caller's environment, which needs to be stored somewhere in the meanwhile.
We can do all these with one rule:

rule <k> closure(Rho,X,E) V:Val => E ~> Rho' ...</k>
     <env> Rho' => Rho[X <- !N] </env>
     <store>... .Map => (!N:Int |-> V) ...</store>

Therefore, we atomically do all the following:

  • switch the computation to the closure's body, E, followed by a
    caller-environment-recovery task Rho' (note that Rho' is the
    current environment),
  • generate a fresh location !N (the ! is important, we discuss it below),
    bind X to !N in closure's environment and switch the current environment
    Rho' to that one,
  • write the value passed to the closure, V, at location !N.

This was the most complex K rule we've seen so far in the tutorial. Note,
however, that this one rule achieves a lot. It is, in fact, quite compact
considering how much it does. Note also that everything that this K rule
mentions is needed also conceptually in order to achieve this task, so it
is minimal from that point of view. That would not be the case if we
used, instead, a conventional rewrite rule, because we would have had to
mention the remaining store, say Sigma, in both sides of the rule, to say
it stays unchanged. Here we just use ....

The declaration of the fresh variable above, !N, is new and needs
some explanation. First, note that !N appears only in the right-hand-side
terms in the rule, that is, it is not matched when the rule is applied.
Instead, a fresh Nat element is generated each time the rule is applied.
In K, we can define syntactic categories which have the capability to
generate fresh elements like above, using unbound variables whose name starts
with a !. The details of how to do that are beyond the scope of this
tutorial (see Tutorial 6). All we need to know here is that an arbitrary
fresh element of that syntactic category is generated each time the rule
is applied. We cannot rely on the particular name or value of the generated
element, because that can change with the next version of the K tool, or
even from execution to execution with the same version. All you can rely
on is that each newly generated element is distinct from the previously
generated elements for the same syntactic category.

Unlike in the substitution-based definition, we now also need a lookup rule:

rule <k> X => V ...</k>
     <env>... X |-> N ...</env>
     <store>... N |-> V ...</store>

This rule speaks for itself: replace X by the value V located in the store
at X's location N in the current environment.

The only thing left to define is the auxiliary environment-recovery operation:

rule _:Val ~> (Rho => .) ... _ => Rho

When the item preceding the environment recovery task Rho in the
computation becomes a value, replace the current environment with Rho
and dissolve Rho from the computation.

Before we kompile, let us make this rule and the lambda evaluation rule
structural, because we do not want these to count as transitions.

Let us kompile and ... fail:

kompile lambda

gives a parsing error saying that V:Val does not fit there in the closure
invocation rule. That's because Val and Exp are currently completely
disconnected, so K rightfully complains that we want to apply a value to
another one, because application was defined to work with expressions, not
values. What we forgot here was to state that Exp includes Val:

syntax Exp ::= Val

Now everything works, but it is a good time to reflect a bit.

So we added closures, which are inherently semantic entities, to the syntax
of expressions. Does that mean that we can now write LAMBDA programs with
closures in them? Interestingly, with our current definition of LAMBDA,
which purposely did not follow the nice organization of IMP into syntax and
semantic modules, and with K's default parser, kast, you can. But you are
not supposed to speculate this! In fact, if you use an external parser, that
parser will reject programs with explicit closures. Also, if we split the
LAMBDA definition into two modules, one called LAMBDA-SYNTAX containing
exclusively the desired program syntax and one called LAMBDA importing the
former and defining the syntax of the auxiliary operations and the semantics,
then even K's default parser will reject programs using auxiliary syntactic
constructs.

Indeed, when you kompile a language, say lang.k, the tool will by default
attempt to find a module LANG-SYNTAX and generate the program parser from
that. If it cannot find it, then it will use the module LANG instead. There
are also ways to tell kompile precisely which syntax module you want to use
for the program parser if you don't like the default convention.
See kompile --help.

Another insightful thought to reflect upon, is the relationship between your
language's values and other syntactic categories. It is often the case that
values form a subset of the original language syntax, like in IMP (Part 2 of
the tutorial), but sometimes that is not true, like in our case here. When
that happens, in order for the semantics to be given smoothly and uniformly
using the original syntax, you need to extend your language's original
syntactic categories with the new values. The same holds true in other
semantic approaches, not only in K, even in ones which are considered purely
syntactic. As it should be clear by now, K does not enforce you to use a
purely syntactic style in your definitions; nevertheless, K does allow you to
develop purely syntactic definitions, like LAMBDA in Part 1 of the tutorial,
if you prefer those.

krun some programs, such as those provided in Lesson 1 of the LAMBDA
tutorial (Part 1). Note the closures, both as results in the <k/> cell,
and as values in the store. Also, since variables are not values anymore,
expressions that contain free variables may get stuck with one of those on
top of their computation. See, for example, free-variable-capture.lambda,
which gets stuck on z, because z is free, so it cannot evaluate it.
If you want, you can go ahead and manually provide a configuration with
z mapped to some location in the environment and that location mapped to
some value in the store, and then you can also execute this program. The
program omega.lambda should still loop.

Although we completely changed the definitional style of LAMBDA, the semantics
of the other constructs do not need to change, as seen in the next lesson.

Go to Lesson 3, LAMBDA++: Reusing Existing Semantics.

MOVIE (out of date) [8'02"]

Reusing Existing Semantics

In this lesson we will learn that, in some cases, we can reuse existing
semantics of language features without having to make any change!

Although the definitional style of the basic LAMBDA language changed quite
radically in our previous lesson, compared to its original definition in
Part 1 of the tutorial, we fortunately can reuse a large portion of the
previous definition. For example, let us just cut-and-paste the rest of the
definition from Lesson 7 in Part 1 of the tutorial.

Let us kompile and krun all the remaining programs from Part 1 of the
tutorial. Everything should work fine, although the store contains lots of
garbage. Garbage collection is an interesting topic, but we do not do it
here. Nevertheless, much of this garbage is caused by the intricate use of
the fixed-point combinator to define recursion. In a future lesson in this
tutorial we will see that a different, environment-based definition of
fixed-points will allocate much less memory.

One interesting question at this stage is: how do we know when we can reuse
an existing semantics of a language feature? Well, I'm afraid the answer is:
we don't. In the next lesson we will learn how reuse can fail for quite subtle
reasons, which are impossible to detect statically (and some non-experts may
fail to even detect them at all).

Go to Lesson 4, LAMBDA++: Do Not Reuse Blindly!.

MOVIE (out of date) [3'21"]

Do Not Reuse Blindly!

It may be tempting to base your decision to reuse an existing semantics of
a language feature solely on syntactic considerations; for example, to reuse
whenever the parser does not complain. As seen in this lesson, this could
be quite risky.

Let's try (and fail) to reuse the definition of callcc from Lesson 1:

syntax Exp ::= "callcc" Exp  [strict]
syntax Val ::= cc(K)
rule <k> (callcc V:Val => V cc(K)) ~> K </k>
rule <k> cc(K) V ~> _ =>  V ~> K </k>

The callcc examples that we tried in Lesson 1 work, so it may look it works.

However, the problem is that cc(K) should also include an environment,
and that environment should also be restored when cc(K) is invoked.
Let's try to illustrate this bug with callcc-env1.lambda

let x = 1 in
  ((callcc lambda k . (let x = 2 in (k x))) + x)

where the second argument of +, x, should be bound to the top x, which
is 1. However, since callcc does not restore the environment, that x
should be looked up in the wrong, callcc-inner environment, so we should see
the overall result 4.

Hm, we get the right result, 3 ... (Note: you may get 4, depending on
your version of K and platform; but both 3 and 4 are possible results, as
explained below and seen in the tests). How can we get 3? Well, recall that
+ is strict, which means that it can evaluate its arguments in any order.
It just happened that in the execution that took place above its second
argument was evaluated first, to 1, and then the callcc was evaluated, but
its cc value K had already included the 1 instead of x ... In Part 4 of
the tutorial we will see how to explore all the non-deterministic behaviors of
a program; we could use that feature of K to debug semantics, too.
For example, in this case, we could search for all behaviors of this program
and we would indeed get two possible value results: 3 and 4.

One may think that the problem is the non-deterministic evaluation order
of +, and thus that all we need to do is to enforce a deterministic order
in which the arguments of + are evaluated. Let us follow this path to
see what happens. There are two simple ways to make the evaluation order
of +'s arguments deterministic. One is to make + seqstrict in the
semantics, to enforce its evaluation from left-to-right. Do it and then
run the program above again; you should get only one behavior for the
program above, 4, which therefore shows that copying-and-pasting our old
definition of callcc was incorrect. However, as seen shortly, that only
fixed the problem for the particular example above, but not in general.
Another conventional approach to enforce the desired evaluation order is to
modify the program to enforce the left-to-right evaluation order using let
binders, as we do in callcc-env2.lambda:

let x = 1 in
  let a = callcc lambda k . (let x = 2 in (k x)) in
    let b = x in
      (a + b)

With your installation of K you may get the "expected" result 4 when you
execute this program, so it may look like our non-deterministic problem is
fixed. Unfortunately, it is not. Using the K tool to search for all the
behaviors in the program above reveals that the final result 3 is still
possible. Moreover, both the 3 and the 4 behaviors are possible regardless
of whether + is declared to be seqstrict or just strict. How is that
possible? The problem is now the non-deterministic evaluation strategy of
the function application construct. Indeed, recall that the semantics of
the let-in construct is defined by desugaring to lambda application:

rule let X = E in E' => (lambda X . E') E     [macro]

With this, the program above eventually reduces to

(lambda a . ((lambda b . a + b) x))
(callcc lambda k . (let x = 2 in (k x)))

in an environment where x is 1. If the first expression evaluates first,
then it does so to a closure in which x is bound to a location holding 1,
so when applied later on to the x inside the argument of callcc (which is
2), it will correctly lookup x in its enclosed environment and thus the
program will evaluate to 3. On the other hand, if the second expression
evaluates first, then the cc value will freeze the first expression as is,
breaking the relationship between its x and the current environment in which
it is bound to 1, being inadvertently captured by the environment of the
let-in construct inside the callcc and thus making the entire expression
evaluate to 4.

So the morale is: Do not reuse blindly. Think!

In the next lesson we fix the environment-based semantics of callcc by having
cc also wrap an environment, besides a computation. We will also give a more
direct semantics to recursion, based on environments instead of fixed-point
combinators.

Go to Lesson 5, LAMBDA++: More Semantic Computation Items.

MOVIE (out of date) [3'37"]

More Semantic Computation Items

In this lesson we see more examples of semantic (i.e., non-syntactic)
computational items, and how useful they can be. Specifically, we fix the
environment-based definition of callcc and give an environment-based
definition of the mu construct for recursion.

Let us first fix callcc. As discussed in Lesson 4, the problem that we
noticed there was that we only recovered the computation, but not the
environment, when a value was passed to the current continuation. This is
quite easy to fix: we modify cc to take both an environment and a
computation, and its rules to take a snapshot of the current environment with
it, and to recover it at invocation time:

syntax Val ::= cc(Map,K)
rule <k> (callcc V:Val => V cc(Rho,K)) ~> K </k> <env> Rho </env>
rule <k> cc(Rho,K) V:Val ~> _ =>  V ~> K </k> <env> _ => Rho </env>

Let us kompile and make sure it works with the callcc-env2.lambda program,
which should evaluate to 3, not to 4.

Note that the cc value, which can be used as a computation item in the <k/>
cell, is now quite semantic in nature, pretty much the same as the closures.

Let us next add one more closure-like semantic computational item, for mu.
But before that, let us reuse the semantics of letrec in terms of mu that
was defined in Lesson 8 of Part 1 of the tutorial on LAMBDA:

syntax Exp ::= "letrec" Id Id "=" Exp "in" Exp
             | "mu" Id "." Exp      [latex(\mu{#1}.{#2})]
rule letrec F:Id X = E in E' => let F = mu F . lambda X . E in E'    [macro]

We removed the binder annotation of mu, because it is not necessary
anymore (since we do not work with substitutions anymore).

To save the number of locations needed to evaluate mu X . E, let us replace
it with a special closure which already binds X to a fresh location holding
the closure itself:

syntax Exp ::= muclosure(Map,Exp)

rule <k> mu X . E => muclosure(Rho[X <- !N], E) ...</k>
     <env> Rho </env>
     <store>... .Map => (!N:Int |-> muclosure(Rho[X <- !N], E)) ...</store>
  [structural]

Since each time mu X . E is encountered during the evaluation it needs to
evaluate E, we conclude that muclosure cannot be a value. We can declare
it as either an expression or as a computation. Let's go with the former.

Finally, here is the rule unrolling the muclosure:

rule muclosure(Rho,E) => E ~> Rho' ...
Rho' => Rho

Note that the current environment Rho' needs to be saved before and
restored after E is executed, because the fixed point may be invoked
from a context with a completely different environment from the one
in which mu X . E was declared.

We are done. Let us now kompile and krun factorial-letrec.lambda from
Lesson 7 in Part 1 of the tutorial on LAMBDA. Recall that in the previous
lesson this program generated a lot of garbage into the store, due to the
need to allocate space for the arguments of all those lambda abstractions
needed to run the fixed-point combinator. Now we need much fewer locations,
essentially only locations for the argument of the factorial function, one at
each recursive call. Anyway, much better than before.

In the next lesson we wrap up the environment definition of LAMBDA++ and
generate its documentation.

Go to Lesson 6, LAMBDA++: Wrapping Up and Documenting LAMBDA++.

MOVIE (out of date) [5'19"]

Wrapping Up and Documenting LAMBDA++

In this lesson we wrap up and nicely document LAMBDA++. In doing so, we also
take the freedom to reorganize the semantics a bit, to make it look better.

See the lambda.k file, which is self-explanatory.

Part 3 of the tutorial is now complete. Part 4 will teach you more features
of the K framework, in particular how to exhaustively explore the behaviors
of non-deterministic or concurrent programs.

MOVIE (out of date) [6'23"]

Part 4: Defining IMP++

IMP++ extends IMP, which was discussed in Part 2 of this tutorial, with several
new syntactic constructs. Also, some existing syntax is generalized, which
requires non-modular changes of the existing IMP semantics. For example,
global variable declarations become local declarations and can occur
anywhere a statement can occur. In this tutorial we will learn the following:

  • That (and how) existing syntax/semantics may change as a language evolves.
  • How to refine configurations as a language evolves.
  • How to define and use fresh elements of desired sorts.
  • How to tag syntactic constructs and rules, and how to use such tags
    with the superheat/supercool/transition options of kompile.
  • How the search option of krun works.
  • How to stream cells holding semantic lists to the standard input/output,
    and thus obtain interactive interpreters for the defined languages.
  • How to delete, save and restore cell contents.
  • How to add/delete cells dynamically.
  • More details on how the configuration abstraction mechanism works.

Like in the previous tutorials, this folder contains several lessons, each
adding new features to IMP++. Do them in order and make sure you completed
and understood the previous tutorials.

Extending/Changing an Existing Language Syntax

Here we learn how to extend the syntax of an existing language, both with
new syntactic constructs and with more general uses of existing constructs.
The latter, in particular, requires changes of the existing semantics.

Consider the IMP language, as defined in Lesson 4 of Part 2 of the tutorial.

Let us first add the new syntactic constructs, with their precedences:

  • variable increment, ++, which increments an integer variable and
    evaluates to the new value;
  • read, which reads and evaluates to a new integer from the input buffer;
  • print, which takes a comma-separated list of arithmetic expressions and
    evaluates and prints each of them in order, from left to right, to the
    output buffer; we therefore define a new list syntactic category, AExps,
    which we pass as an argument to print; note we do not want to declare
    print to be strict, because we do not want to first evaluate the
    arguments and then print them (for example, if the second argument performs
    an illegal operation, say division by zero, we still want to print the first
    argument); we also go ahead and add strings as arithmetic expressions,
    because we intend print to also take strings, in order to print nice
    messages to the user;
  • halt, which abruptly terminates the program; and
  • spawn, which takes a statement and creates a new concurrent thread
    executing it and sharing its environment with the parent thread.

Also, we want to allow local variable declarations, which can appear anywhere
a statement can appear. Their scope ranges from the place they are defined
until the end of the current block, and they can shadow previous declarations,
both inside and outside the current block. The simplest way to define the
syntax of the new variable declarations is as ordinary statements, at the same
time removing the previous Pgm syntactic category and its construct.
Programs are now just statements.

We are now done with adding the new syntax and modifying the old one.
Note that the old syntax was modified in a way which makes the previous IMP
programs still parse, but this time as statements. Let us then modify
the configuration variable $PGM to have the sort Stmt instead of Pgm,
and let us try to run the old IMP programs, for example sum.imp.

Note that they actually get stuck with the global declaration on the top
of their computations. This is because variable declarations are now treated
like any statements, in particular, the sequential composition rule applies.
This makes the old IMP rule for global variable declarations not match anymore.
We can easily fix it by replacing the anonymous variable _, which matched
the program's statement that now turned into the remaining computation in
the <k/> cell, with the cell frame variable ..., which matches the
remaining computation. Similarly, we have to change the rule for the case
where there are no variables left to declare into one that dissolves itself.

We can now run all the previous IMP programs, in spite of the fact that
our IMP++ semantics is incomplete and, more interestingly, in spite of the
fact that our current semantics of blocks is incorrect in what regards the
semantics of local variable declarations (note that the old IMP programs do
not declare block-local variables, which is why they still run correctly).

Let us also write some proper IMP++ programs, which we would like to execute
once we give semantics to the new constructs.

div.imp is a program manifesting non-deterministic behaviors due to the
desired non-deterministic evaluation strategy of division and the fact that
expressions will have side effects once we add variable increment. We will
be able to see all the different behaviors of this program. Challenge: can
you identify the behavior where the program performs a division-by-zero?

If we run div.imp now, it will get stuck with the variable increment
construct on top of the computation cell. Once we give it a semantics,
div.imp will execute completely (all the other constructs in div.imp
already have their semantics defined as part of IMP).

Note that some people prefer to define all their semantics in a by need
style, that is, they first write and parse lots of programs, and then they
add semantics to each language construct on which any of the programs gets
stuck, and so on and so forth until they can run all the programs.

io.imp is a program which exercises the input/output capabilities of the
language: reads two integers and prints three strings and an integer.
Note that the variable declaration is not the first statement anymore.

sum-io.imp is an interactive variant of the sum program.

spawn.imp is a program which dynamically creates two threads that interact
with the main thread via the shared variable x. Lots of behaviors will be
seen here once we give spawn the right semantics.

Finally, locals.imp tests whether variable shadowing/unshadowing works well.

In the next lesson we will prepare the configuration for the new constructs,
and will see what it takes to adapt the semantics to the new configuration.
Specifically, we will split the state cell into an environment cell and a
store cell, like in LAMBDA++ in Part 3 of the tutorial.

Go to Lesson 2, IMP++: Configuration Refinement; Freshness.

MOVIE (out of date) [07'47"]

Configuration Refinement; Freshness

To prepare for the semantics of threads and local variables, in this lesson we
split the state cell into an environment and a store. The environment and
the store will be similar to those in the definition of LAMBDA++ in Part
3 of the Tutorial. This configuration refinement will require us to change
some of IMP's rules, namely those that used the state.

To split the state map, which binds program variables to values, into an
environment mapping program variables to locations and a store mapping
locations to values, we replace in the configuration declaration the cell

<state color="red"> .Map </state>

with two cells

<env color="LightSkyBlue"> .Map </env>
<store color="red"> .Map </store>

Structurally speaking, this split of a cell into other cells is a major
semantic change, which, unfortunately, requires us to revisit the existing
rules that used the state cell. One could, of course, argue that we could
have avoided this problem if we had followed from the very beginning the
good-practice style to work with an environment and a store, instead of a
monolithic state. While that is a valid argument, highlighting the fact that
modularity is not only a feature of the framework alone, but one should also
follow good practices to achieve it, it is also true that if all we wanted
in Part 2 of the tutorial was to define IMP as is, then the split of the state
in an environment and a store is unnecessary and not really justified.

The first rule which used a state cell is the lookup rule:

rule <k> X:Id => I ...</k> <state>... X |-> I ...</state>

We modify it as follows:

rule <k> X:Id => I ...</k>
     <env>... X |-> N ...</env>
     <store>... N |-> I ...</store>

So we first match the location N of X in the environment, then the value
I at location N in the store, and finally we rewrite X to I into the
computation. This rule also shows an instance of a more complex
multiset matching, where two variables (X and N) are matched each twice.

The assignment rule is modified quite similarly.

The variable declaration rule is trickier, though, because we need to allocate
a fresh location in the store and bind the newly declared variable to it.
This is quite similar to the way we allocated space for variables in
the environment-based definition of LAMBDA++ in Part 3 of the tutorial.

rule <k> int (X,Xs => Xs); ...</k>
     <env> Rho => Rho[X <- !N:Int] </env>
     <store>... .Map => !N |-> 0 ...</store>

Note the use of the fresh (!N) variable notation above. Recall from
the LAMBDA++ tutorial that each time the rule with fresh (!) variables is
applied, fresh elements of corresponding sorts are generated for the fresh
variables, distinct from all the previously generated elements; also, we
cannot and should not assume anything about the particular element that is
being generated, except that it is different from the previous ones.

kompile and krun sum.imp to see how the fresh locations have been
generated and used. There were two fresh locations needed, for the two
variables. Note also that a cell holding the counter has been added to the
configuration.

In the next lesson we will add the semantics of variable increment, and see
how that yields non-deterministic behaviors in programs and how to explore
those behaviors using the K tool.

Go to Lesson 3, IMP++: Tagging; Superheat/Supercool Kompilation Options.

MOVIE (out of date) [04'06"]

Tagging; Transition Kompilation Option

In this lesson we add the semantics of variable increment. In doing so, we
learn how to tag syntactic constructs and rules and then use such tags to
instruct the kompile tool to generate the desired language model that is
amenable for exhaustive analysis.

The variable increment rule is self-explanatory:

rule <k> ++X => I +Int 1 ...</k>
     <env>... X |-> N ...</env>
     <store>... N |-> (I => I +Int 1) ...</store>

We can now run programs like our div.imp program introduced in Lesson 1.
Do it.

The addition of increment makes the evaluation of expressions have side
effects. That, in combination with the non-determinism allowed by the
strictness attributes in how expression constructs evaluate their
arguments, makes expressions in particular and programs in general have
non-deterministic behaviors. One possible execution of the div.imp program
assigns 1 to y's location, for example, but this program manifests several
other behaviors, too.

To see all the (final-state) behaviors that a program can have, you can call
the krun tool with the option --search. For example:

krun div.imp --search

Oops, we see only one solution, the same as when we ran it without search.

Here is what happens. krun can only explore as much of the transition
system associated to a program as kompile allowed the generated language
model to yield. Since most of the K users are interested in language models
that execute efficiently, that is, in faster interpreters for the defined
languages, by default kompile optimizes the generated language model for
execution. In particular, it inserts no backtracking markers, which krun
uses when called with the --search option in order to systematically generate
the entire transition system associated to a program. This is why krun
showed us only one solution when run with the --search option on div.imp.

We next explain how to tell kompile what kind of language model we are
interested in for analysis purposes. When you experiment with non-determinism
in a language semantics, you should keep it in mind that the --transition
option of kompile allows you to configure what counts as a transition in
your language model. We here only discuss transitions due to the
non-deterministic evaluation strategies of language constructs, but we will
see in future lectures (see Lesson 6 of IMP++, where we add concurrency) that
we can also have transitions due to non-deterministic applications of rewrite
rules.

If you want to explore the entire behavior space due to non-deterministic
evaluation strategies, then you should include all the language constructs
in the --transition option. This may sound like the obvious thing to
always do, but as soon as you do it you soon realize that it is way too much
in practice when you deal with large languages or programs. There are simply
too many program behaviors to consider, and krun will likely hang
on you or crush. For example, a small ten-statement program where each
statement uses one strict expression construct already has 1000+ behaviors for
krun to explore! Driven by practical needs of its users, the K tool
therefore allows you to finely tune the generated language models using the
--transition option.

To state which constructs are to be considered to generate transitions in the
generated language model, and for other reasons, too, the K tool allows you to
tag any production and any rule. You can do this the same way we tagged
rules with the structural keyword in earlier tutorials: put the tag in
brackets. You can associate multiple tags to the same construct or rule, and
more than one construct or rule can have the same tag. As an example, let us
tag the division construct with division, the lookup rule with lookup and
the increment rule with increment. The tags of the rules are not needed
in this lesson, we do it only to demonstrate that rules can also be tagged.

The least intrusive way to enforce our current language to explore the
entire space of behaviors due to the strictness of division is to kompile it
with the following option:

kompile imp.k --transition "division"

It is interesting to note that the lookup and increment rules are the only
two rules which can trigger non-deterministic behaviors for division, because
no other rule but these two can ever apply while a division operation is
heated. Previous versions of K allowed you to also specify which rules could
trigger non-deterministic behaviors of operator evaluation strategies,
but that option was rarely used and is not available anymore.

Note that it is highly non-trivial to say precisely whether a strict language
construct may yield non-deterministic behaviors. For example, division's
strictness would yield no non-determinism if the language had no side effects.
It is even harder to say so for a particular program. Consequently, our K
implementation makes no attempt to automatically detect which operations
should be tagged as transitions. Instead, it provides the functionality to
let you decide it.

Now the command

krun div.imp --search

shows us all five behaviors of this program. Interestingly, one
of the five behaviors yields a division by zero!

The --transition option can be quite useful when you experiment with your
language designs or when you formally analyze programs for certain kinds of
errors. Please let us know if you ever need more finer-grained control over
the non-determinism of your language models.

Before we conclude this lesson, we'd like to let you know one trick, which
you will hopefully not overuse: you can tag elements in your K definition with
kompile option names, and those elements will be automatically included in
their corresponding options. For example, if you tag the division production
with transition then the command

kompile imp

is completely equivalent to the previous kompile command.

Please use this default behavior with caution, or even better, try to avoid
using it! You may be tempted to add the transition tag to lots of elements
and then forget about them; your language models will then be increasingly slower
when you execute them and you may wonder why ... This convention is typically
convenient when you want to quickly experiment with non-determinism and do not
want to bother inventing tag names and calling kompile with options.

In the next lesson we add input/output to our language and learn how to
generate a model of it which behaves like an interactive interpreter!

Go to Lesson 4, IMP++: Semantic Lists; Input/Output Streaming.

MOVIE (out of date) [06'56"]

Semantic Lists; Input/Output Streaming

In this lesson we add semantics to the read and print IMP++ constructs.
In doing so, we also learn how to use semantic lists and how to connect
cells holding semantic lists to the standard input and standard output.
This allows us to turn the K semantics into an interactive interpreter.

We start by adding two new cells to the configuration,

<in color="magenta"> .List </in>
<out color="Orchid"> .List </out>

each holding a semantic list, initially empty. Semantic lists are
space-separated sequences of items, each item being a term of the form
ListItem(t), where t is a term of sort K. Recall that the semantic maps,
which we use for states, environments, stores, etc., are sets of pairs
t1 |-> t2, where t1 and t2 are terms of sort K. The ListItem wrapper
is currently needed, to avoid parsing ambiguities.

Since we want the print statement to also print strings, we need to tell
K that strings are results. To make it more interesting, let us also overload
the + symbol on arithmetic expressions to also take strings and, as a
result, to concatenate them. Since + is already strict, we only need to add
a rule reducing the IMP addition of strings to the builtin operation +String
which concatenates two strings.

The semantics of read is immediate: read and consumes the first integer item
from the <in/> cell; note that our read only reads integer values (it gets
stuck if the first item in the <in/> cell is not an integer).

The semantics of print is a bit trickier. Recall that print takes an
arbitrary number of arithmetic expression arguments, and evaluates and outputs
each of them in order, from left to right. For example,
print("Hello", 3/0, "Bye"); outputs "Hello" and then gets stuck on the
illegal division by zero operation. In other words, we do not want it to
first evaluate all its arguments and then print them, because that would miss
outputting potentially valuable information. So the first step is to evaluate
the first argument of print. In some sense, what we'd like to say is that
print has the evaluation strategy strict(1). However, strictness
attributes only work with individual language constructs, while what we need
is an evaluation strategy that involves two constructs: print and the list
(comma) construct of AExps. If we naively associate print the strict(1)
evaluation strategy then its first and unique argument, an AExps list, will
be scheduled for evaluation and the execution will get stuck because we have
no rules for evaluating AExps terms. If we make the list construct of
AExps strict then we get the wrong semantics for print which first
evaluates all its arguments and then outputs them. The correct way to
tell K that print should evaluate only its first argument is by using a
context declaration:

context print(HOLE:AExp, _);

Note the HOLE of sort AExp above. Contexts allow us to define finer-grain
evaluation strategies than the strictness attributes, involving potentially
more than one language construct, like above. The HOLE indicates the
argument which is requested to be evaluated. For example, the strict
attribute of division corresponds to two contexts:

context HOLE / _
context _ / HOLE

In their full generality, contexts can be any terms with precisely one
occurrence of a HOLE, and with arbitrary side conditions on any variables
occurring in the context term as well as on the HOLE. See Part 6 of the
tutorial for more examples.

Once evaluated, the first argument of print is expected to become either an
integer or a string. Since we want to print both integers and string values,
to avoid writing two rules, one for each type of value, we instead add a new
syntactic category, Printable, which is the union of integers and strings.

Let us kompile and krun the io.imp program discussed in Lesson 1. As
expected, it gets stuck with a read construct on top of the computation and
with an empty <in/> cell. To run it, we need to provide some items in the
<in/> cell, so that the rule of read can match. Let us add

<in> ListItem(3) ListItem(5) ListItem(7) </in>

Now, if we krun io.imp, we can see that its execution completes normally
(the <k/> cell is empty), that the first two items have been removed by the
two read constructs from the <in/> cell, and that the desired strings and
numbers have been placed into the <out/> cell.

Cells holding semantic lists can be connected to the standard input and
standard output buffers, and krun knows how to handle these appropriately.
Let us connect the <in/> cell to the standard input using the cell attribute
stream="stdin" and the <out/> cell to the standard output with the
attribute stream="sdtout". A cell connected to the standard input will
take its items from the standard input and block the rewriting process when
an input is needed until an item is available in the standard input buffer.
A cell connected to the standard output buffer will send all its items, in
order, to the standard output.

Let us kompile and krun io.imp again. It prints the message and then
waits for your input numbers. Type in two numbers, then press <Enter>.
A message with their sum is then printed, followed by the final configuration.
If you do not want to see the final configuration, and thus obtain a realistic
interpreter for our language, then call krun with the option --output none:

krun io.imp --output none

Let us now krun our interactive sum program, which continuously reads numbers
from the console and prints the sum of numbers up to them:

krun sum-io.imp

Try a few numbers, then 0. Note that the program terminated, but with junk
in the <k/> cell, essentially with a halt statement on its top. Of course,
because halt has been reached and it has no semantics yet.

In the next lesson we give the semantics of halt and also fix the semantics
of blocks with local variable declarations.

Go to Lesson 5, IMP++: Deleting, Saving and Restoring Cell Contents.

MOVIE (out of date) [05'21"]

Deleting, Saving and Restoring Cell Contents

In this lesson we will see how easily we can delete, save and/or restore
contents of cells in order to achieve the desired semantics of language
constructs that involve abrupt changes of control or environments. We have
seen similar or related K features in the LAMBDA++ language in Part 3 of the
tutorial.

Let us start by adding semantics to the halt statement. As its name says,
what we want is to abruptly terminate the execution of the program. Moreover,
we want the program configuration to look as if the program terminated
normally, with an empty computation cell. The simplest way to achieve that is
to simply empty the computation cell when halt is encountered:

rule <k> halt; ~> _ => . </k>

It is important to mention the entire <k/> cell here, with both its membranes
closed, to make sure that its entire contents is discarded. Note the
anonymous variable, which matches the rest of the computation.

kompile and krun sum-io.imp. Note that unlike in Lesson 4, the program
terminates with an empty computation cell now.

As mentioned earlier, the semantics of blocks that was inherited from IMP is
wrong. Program locals.imp shows it very clearly: the environments are not
correctly restored at block exits. One way to fix the problem is to take
a snapshot of the current environment when a block is entered and save it
somewhere, and then to restore it when the block is left. There are many
ways to do this, which you can explore on your own: for example you can add
a new list cell for this task where to push/pop the environment snapshots in
a stack style; or you can use the existing environment cell for this purpose,
but then you need to change the variable access rules to search through the
stacked environments for the variable.

My preferred solution is to follow a style similar to how we saved/restored
LAMBDA++ environments in Part 3 of the Tutorial, namely to use the already
existing <k/> cell for such operations. More specifically, we place a
reminder item in the computation whenever we need to take a snapshot of
some cell contents; the item simply consists of the entire contents of the cell.
Then, when the reminder item is reached, we restore the contents of the cell:

rule <k> {S} => S ~> Rho ...</k> <env> Rho </env>  [structural]

The only thing left now is to give the definition of environment restore:

rule <k> Rho => . ...</k> <env> _ => Rho </env>    [structural]

Done. kompile and krun locals.imp. Everything should work correctly now.
Note that the rule above is different from the one we had for LAMBDA++ in
Part 3 of the tutorial, in that here there is no value preceding the environment
restoration item in the computation; that's because IMP++ statements,
unlike LAMBDA++'s expressions, evaluate to nothing (.).

In the next lesson we will give semantics to the spawn S construct, which
dynamically creates a concurrent shared-memory thread executing statement S.

Go to Lesson 6, IMP++: Adding/Deleting Cells Dynamically; Configuration Abstraction, Part 2.

MOVIE (out of date) [04'30"]

Adding/Deleting Cells Dynamically; Configuration Abstraction, Part 2

In this lesson we add dynamic thread creation and termination to IMP, and
while doing so we learn how to define and use configurations whose structure
can evolve dynamically.

Recall that the intended semantics of spawn S is to spawn a new concurrent
thread that executes S. The new thread is being passed at creation time
its parent's environment, so it can share with its parent the memory
locations that its parent had access to at creation time. No other locations
can be shared, and no other memory sharing mechanism is available.
The parent and the child threads can evolve unrestricted, in particular they
can change their environments by declaring new variables or shadowing existing
ones, can create other threads, and so on.

The above suggests that each thread should have its own computation and its
own environment. This can be elegantly achieved if we group the <k/> and
<env/> cells in a <thread/> cell in the configuration. Since at any given
moment during the execution of a program there could be zero, one or more
instances of such a <thread/> cell in the configuration, it is a good idea
to declare the <thread/> cell with multiplicity * (i.e., zero, one or more):

<thread multiplicity="*" color="blue">
  <k color="green"> $PGM:Stmt </k>
  <env color="LightSkyBlue"> .Map </env>
</thread>

This multiplicity declaration is not necessary, but it is a good idea to do
it for several reasons:

  1. it may help the configuration abstraction process,
    which may in turn significantly increase the compactness and modularity of
    your subsequent rules;
  2. it may help various analysis and execution tools,
    for example static analyzers to give you error messages when you create cells
    where you should not, or K compilers to improve performance by starting
    actual concurrent hardware threads or processes corresponding to each cell
    instance; and
  3. it may help you better understand and control the dynamics
    of your configuration, and thus your overall semantics.

For good encapsulation, I also prefer to put all thread cells into one cell,
<threads/>. This is technically unnecessary, though; to convince yourself
that this is indeed the case, you can remove this cell once we are done with
the semantics and everything will work without having to make any changes.

Before we continue, let us kompile an krun some programs that used to
work, say sum-io.imp. In spite of the relatively radical configuration
reorganization, those programs execute just fine! How is that possible?
In particular, why do rules like the lookup and assignment still work,
unchanged, in spite of the fact that the <k/> and <env/> cells are not at
the same level with the <store/> cell in the configuration anymore?

Welcome to configuration abstraction, part 2. Recall that the role of
configuration abstraction is to allow you to only write the relevant
information in each rule, and have the compiler fill-in the obvious and boring
details. According to the configuration that we declared for our new
language, there is only one reasonable way to complete rules like the lookup,
namely to place the <k/> and </env> cells inside a <thread/> cell,
inside a <threads/> cell:

rule <threads>...
       <thread>...
         <k> X:Id => I ...</k>
         <env>... X |-> N ...</env>
       ...</thread>
     ...<threads/>
     <store>... N |-> I ...</store>  [lookup]

This is the most direct, compact and local way to complete the configuration
context of the lookup rule. If for some reason you wanted here to match the
<k/> cell of one thread and the <env/> cell of another thread, then you
would need to explicitly tell K so, by mentioning the two thread cells,
for example:

rule <thread>...
         <k> X:Id => I ...</k>
     ...</thread>
     <thread>...
         <env>... X |-> N ...</env>
     ...</thread>
     <store>... N |-> I ...</store>  [lookup]

By default, K completes rules in a greedy style. Think this way: what is the
minimal number of changes to my rule to make it fit the declared
configuration? That's what the K tool will do.

Configuration abstraction is technically unnecessary, but once you start
using it and get a feel for how it works, it will become your best friend.
It allows you to focus on the essentials of your semantics, and at the same
time gives you flexibility in changing the configuration later on without
having to touch the rules. For example, it allows you to remove the
<threads/> cell from the configuration, if you don't like it, without
having to touch any rule.

We are now ready to give the semantics of spawn:

rule <k> spawn S => . ...</k> <env> Rho </env>
     (. => <thread>... <k> S </k> <env> Rho </env> ...</thread>)

Note configuration abstraction at work, again. Taking into account
the declared configuration, and in particular the multiplicity information
* in the <thread/> cell, the only reasonable way to complete the rule
above is to wrap the <k/> and <env/> cells on the first line within a
<thread/> cell, and to fill-in the ...s in the child thread with the
default contents of the other subcells in <thread/>. In this case there
are no other cells, so we can get rid of those ...s, but that would
decrease the modularity of this rule: indeed, we may later on add other
cells within <thread/> as the language evolves, for example a function
or an exception stack, etc.

In theory, we should be able to write the rule above even more compactly
and modularly, namely as

rule <k> spawn S => . ...</k> <env> Rho </env>
     (. => <k> S </k> <env> Rho </env>)

Unfortunately, this currently does not work in the K tool, due to some
known limitations of our current configuration abstraction algorithm.
This latter rule would be more modular, because it would not even depend
on the cell name thread. For example, we may later decide to change
thread into agent, and we would not have to touch this rule.
We hope this current limitation will be eliminated soon.

Once a thread terminates, its computation cell becomes empty. When that
happens, we can go ahead and remove the useless thread cell:

rule <thread>... <k> . </k> ...</thread> => .  [structural]

Let's see what we've got. kompile and krun spawn.imp.
Note the following:

  • The <threads/> cell is empty, so all threads terminated normally;
  • The value printed is different from the value in the store; the store value
    is not even the one obtained if the threads executed sequentially.

Therefore, interesting behaviors may happen; we would like to see them all!

Based on prior experience with krun's search option, we would hope that

krun spawn.imp --search

shows all the behaviors. However, the above does not work, for two reasons.

First, spawn.imp is an interactive program, which reads a number from the
standard input. When analyzing programs exhaustively using the search option,
krun has to disable the streaming capabilities (just think about it and you
will realize why). The best you can do in terms of interactivity with search
is to pipe some input to krun: krun will flush the standard input buffer
into the cells connected to it when creating the initial configuration (will
do that no matter whether you run it with or without the --search option).
For example:

echo 23 | krun spawn.imp --search

puts 23 in the standard input buffer, which is then transferred in the
<in/> cell as a list item, and then the exhaustive search procedure is
invoked.

Second, even after piping some input, the spawn.imp program still manifests
only one behavior, which does not seem right. There should be many more.

As explained in Lesson 3, by default kompile optimizes the generated
language model for execution. In particular, it does not insert any
backtracking markers where transition attempts should be made, so krun
lacks the information it needs to exhaustively search the generated language
model. Like we did in Lesson 3 with the language constructs, we also have
to explicitly tell kompile which rules should be considered as actual
transitions. A theoretically correct but practically unfeasible approach
to search all possible behaviors is to consider all rules as transitions.
Even more than with the non-deterministic strictness of language constructs
in Lesson 3, such a naive solution would make the number of behaviors, and
thus krun, explode. Remember that a two-thread program with 150 statements
each manifests more behaviors than particles in the known universe!
Consequently, unless your multi-threaded programs are very small, you will
most likely want to control which rules should be considered transitions and
which should not.

A good rule of thumb is to include as transitions only those rules which
compete for behaviors. That is, those rules which may yield a different
behavior if we choose to apply them when other rules match as well.
The rule for addition, for example, is a clear example of a rule which
should not be a transition: indeed, 3+7 will rewrite to 10 now and also
later. On the other hand, the lookup rule should be a transition. Indeed,
if we delay the lookup of variable x, then other threads may write x in the
meanwhile (with an increment or an assignment rule) and thus yield a
different behavior.

Let us discuss and tag those rules which should be transitions: lookup and
increment need to be transitions and we already tagged them in Lesson 3;
the read rule needs to also be a transition, because it may complete with
other instances of itself in other threads; assignment needs to also be a
transition, and so should be the first rule for print.

Let us now kompile with the transition option set as desired:

kompile imp --transition "lookup increment assignment read print"

Now echo 23 | krun spawn.imp --search gives us all 12 behaviors of the
spawn.imp program.

Like for non-deterministically strict operations which can be tagged as
transitions, it is highly non-trivial to say precisely which rules need
to be transitions. So krun makes no attempt to automatically detect it.
Instead, it provides the functionality to let you decide it.

We currently have no mechanism for thread synchronization. In the next lesson
we add a join statement, which allows a thread to wait until another completes.

Go to Lesson 7, IMP++: Everything Changes: Syntax, Configuration, Semantics.

MOVIE (out of date) [11'40"]

Everything Changes: Syntax, Configuration, Semantics

In this lesson we add thread joining, one of the simplest thread
synchronization mechanisms. In doing so, we need to add unique ids
to threads in the configuration, and to modify the syntax to allow spawn
to return the id of the newly created thread. This gives us an opportunity
to make several other small syntactic and semantics changes to the language,
which make it more powerful or more compact at a rather low cost.

Before we start, let us first copy and modify the previous spawn.imp program
from Lesson 1 to make use of thread joining. Recall from Lesson 6 that in some
runs of this program the main thread completed before the child threads,
printing a possibly undesired value of x. What we want now is to assign
unique ids to the two spawned threads, and then to modify the main thread to
join the two child threads before printing. To avoid adding a new type to
the language, let's assume that thread ids are integer numbers. So we declare
two integers, t1 and t2, and assign them the two spawn commands. In order
for this to parse, we will have to change the syntax of spawn to be an
arithmetic expression construct instead of a statement. Once we do that,
we have a slight syntactic annoyance: we need to put two consecutive ;
after the spawn assignment, one for the assignment statement inside the spawn,
and another for the outer assignment. To avoid the two consecutive semicolons,
we can syntactically enforce spawn to take a block as argument, instead of a
statement. Now it looks better. The new spawn.imp program is still
non-deterministic, because the two threads can execute in any order and even
continue to have a data-race on the shared variable x, but we should see fewer
behaviors when we use the join statements. If we want to fully synchronize
this program, we can have the second thread start with a join(t1) statement.
Then we should only see one behavior for this program.

Let us now modify the language semantics. First, we move the spawn
construct from statements to expressions, and make it take a block.
Second, we add one more sub-cell to the thread cell in the configuration,
<id/>, to hold the unique identifier of the thread. We want the main
thread to have id 0, so we initialize this cell with 0. Third, we modify
the spawn rule to generate a fresh integer identifier, which is put in the
<id/> cell of the child thread and returned as a result of spawn in the
parent thread. Fourth, let us add the join statement to the language,
both syntactically and semantically. So in order for the join(T) statement
to execute, thread T must have its computation empty. However, in order
for this to work we have to get rid of the thread termination cleanup rule.
Indeed, we need to store somewhere the information that thread T terminated;
the simplest way to do it is to not remove the terminated threads. Feel free
to experiment with other possibilities, too, here. For example, you may add
another cell, <done/>, in which you can store all the thread ids of the
terminated and garbage-collected threads.

Let us now kompile imp.k and convince ourselves that the new spawn.imp
with join statements indeed has fewer behaviors than its variant without
join statements. Also, let us convince ourselves that the fully synchronized
variant of it indeed has only one behavior.

Note that now spawn, like variable increment, makes the evaluation of
expressions to have side effects. Many programming languages in fact allow
expressions to be evaluated only for their side effects, and not for their
value. This is typically done by simply adding a ; after the expression
and thus turning it into a statement. For example, ++x;. Let as also
allow arithmetic expressions in our language to be used as statements, by
simply adding the production AExp ";" to Stmt, with evaluation strategy
strict and with the expected semantics discarding the value of the AExp.

Another simple change in syntax and semantics which gives our language more
power, is to remove the ; from the syntax of variable assignments and to make
them expression instead of statement constructs. This change, combined with
the previous one, will still allow us to parse all the programs that we could
parse before, but will also allow us to parse more programs. For example, we
can now do sequence assignments like in C: x = y = z = 0. The semantics
of assignment now has to return the assigned value also to the computation,
because we want the assignment expression to evaluate to the assigned value.

Let us also make another change, but this time one which only makes the
definition more compact. Instead of defining statement sequential
composition as a binary construct for statements, let us define a new
syntactic construct, Stmts, as whitespace-separated lists of Stmt. This
allows us to get rid of the empty blocks, because we can change the syntax of
blocks to {Stmts} and Stmts also allows the empty sequence of statements.
However, we do have to make sure that .Stmts dissolves.

In general, unless you are defining a well-established programming language,
it is quite likely that your definitions will suffer lots of changes like the
ones seen in this lecture. You add a new construct, which suggests changes
in the existing syntax making in fact your language parse more programs,
which then requires corresponding changes in the semantics, and so on.
Also, compact definitions are desirable in general, because they are easier
to read and easier to change if needed later.

In the next lesson we wrap up and document the definition of IMP++.

Go to Lesson 8, IMP++: Wrapping up Larger Languages.

Wrapping up Larger Languages

In this lesson we wrap up IMP++'s semantics and also generate its poster.
While doing so, we also learn how to display larger configurations in order
to make them easier to read and print.

Note that we rearrange a bit the semantics, to group the semantics of old
IMP's constructs together, and separate it from the new IMP++'s semantics.

There is a detailed discussion at the end of the document about the
--transition option of kompile, because that is important and we want
the poster to include everything we learned in this part of the tutorial.

You can go even further and manually edit the generated Latex document.
You typically want to do that when you want to publish your language
definition, or parts of it, and you need to finely tune it to fit the
editing requirements. For example, you may want to insert some negative
spaces, etc.

Part 4 of the tutorial is now complete. At this moment you should know most
of K framework's features and how to use the K tool. You can now define or
design your own programming languages, and then execute and analyze programs.

MOVIE (out of date) [06'26"]

Part 5: Defining Type Systems

In this part of the tutorial we will show that defining type systems for
languages is essentially no different from defining semantics. The major
difference is that programs and fragments of programs now rewrite to their
types, instead of to concrete values. In terms of K, we will learn how
to use it for a certain particular but important kind of applications.

Imperative, Environment-Based Type Systems

In this lesson you learn how to define a type system for an imperative
language (the IMP++ language defined in Part 4 of the tutorial), using a style
based on type environments.

Let us copy the imp.k file from Part 4 of the tutorial, Lesson 7, which holds
the semantics of IMP++, and modify it into a type system. The resulting type
system, when executed, yields a type checker.

We start by defining the new strictness attributes of the IMP++ syntax.
While doing so, remember that programs and fragments of programs now reduce
to their types. So types will be the new results of our new (type) semantics.
We also clean up the semantics by removing the unnecessary tags, and also
use strict instead of seqstrict wherever possible, because strict gives
implementations more freedom. Interestingly, note that spawn is strict now,
because the code of the child thread should type in the current parent's type
environment. Note that this is not always the case for threads, see for example
SIMPLE in the languages tutorial, but it works here for our simpler IMP++.

From a typing perspective, the && construct is strict in both its arguments;
its short-circuit (concrete) semantics is irrelevant for its (static) type
system. Similarly, both the conditional and the while loop are strict
constructs when regarded through the typing lenses.

Finally, the sequential composition is now sequentially strict! Indeed,
statements are now going to reduce to their type, stmt, and it is critical
for sequential composition to type its argument statements left-to-right;
for example, imagine that the second argument is a variable declaration (whose
type semantics will modify the type environment).

We continue by defining the new results of computations, that is, the actual
types. In this simple imperative language, we only have a few constant types:
int, bool, string, block and stmt.

We next define the new configuration, which is actually quite simple. Besides
the <k/> cell, all we need is a type environment cell, <tenv/>, which will
hold a map from identifiers to their types. A type environment is therefore
like a state in the abstract domain of type values.

Let us next modify the semantic rules, turning them into a type system. In
short, the idea is to reduce the basic values to their types, and then have a
rule for each language construct reducing it to its result type whenever its
arguments have the expected types.

We write the rules in the order given by the syntax declarations, to make
sure we do not forget any construct.

Integers reduce to their type, int.

So do the strings.

Variables are now looked up in the type environment and reduced to their type
there. Since we only declare integer variables in IMP++, their type in tenv
will always be int. Nevertheless, we write the rule generically, so that we
would not have to change it later if we add other type declarations to IMP++.
Note that we reject programs which lookup undeclared variables. Rejection,
in this case, means rewriting getting stuck.

Variable increment types to int, provided the variable has type int.

Read types to int, because we only allow integer input.

Division is only allowed on integers, so it rewrites to int provided that its
arguments rewrite to int. Note, however, that in order to write int / int,
we have to explicitly add int to the syntax of arithmetic expressions.
Otherwise, the K parser rightfully complains, because / was declared on
arithmetic expressions, not on types. One simple and generic way to allow
types to appear anywhere, is to define Type as a syntactic subcategory of all
the other syntactic categories. Let's do it on a by-need basis, though.

Addition is overloaded, so we add two typing rules for it: one for integers
and another for strings.

As discussed, spawn types to stmt provided that its argument types to
block.

The assignment construct was strict(2); its typing policy is that the declared
type of X should be identical to the type of the assigned value. Like for
lookup, we define this rule more generically than needed for IMP++, for any
type, not only for int.

The typing rules for Boolean expression constructs are in the same spirit.
Note that we need only one rule for &&.

The typing of blocks is a bit trickier. First, note that we still need to
recover the environment after the block is typed, because we do not want the
block-local variables to be visible in the outer type environment. We recover
the type environment only after the block-enclosed statements type; moreover,
we also opportunistically yield a block type on the computation when we
discard the type environment recovery item. To account for the fact that the
block-enclosed statement can itself be a block (e.g., {{S}}), we would need an
additional rule. Since we do not like repetition, we instead group the types
block and stmt into one syntactic category, BlockOrStmtType, and now we
can have only one rule. We also include BlockOrStmtType in Type, as a
replacement for the two basic types.

The expression statement types as expected. Recall that we only allow
arithmetic expressions, which type to int, to be used as statements in IMP++.

The conditional was declared strict in all its arguments. Its typing policy
is that its first argument types to bool and its two branches to block.
If that is the case, then it yields a stmt type.

For while, its first argument should type to bool and its second to block.

Variable declarations add new bindings to the type environment. Recall that
we can only declare variables of integer type in IMP++.

The typing policy of print is that it can only print integer or string values,
and in that case it types to stmt. Like for BlockOrStmtType, to avoid
having two similar rules, one for int and another for string, we prefer to
introduce an additional syntactic category, PrintableType, which includes both
int and string types.

halt types to stmt; so its subsequent code is also typed.

join types to stmt, provided that its argument types to int.

Sequential composition was declared as a whitespace-separated sequentially
strict list. Its typing policy is that all the statements in the list should
type to stmt or block in order for the list to type to stmt. Since
lists are maintained internally as cons-lists, this is probably the simplest
way to do it:

rule .Stmts => stmt
rule _:BlockOrStmtType Ss => Ss

Note that the first rule, which types the empty sequence of statements to stmt,
is needed anyway, to type empty blocks {} (together with the block rule).

kompile imp.k and krun all the programs in Part 4 of the tutorial. They
should all type to stmt.

In the next lesson we will define a substitution-based type system for LAMBDA.

Go to Lesson 2, Type Systems: Substitution-Based Higher-Order Type Systems.

MOVIE (out of date) [10'11"]

Substitution-Based Higher-Order Type Systems

In this lesson you learn how to define a substitution-based type system for
a higher-order language, namely the LAMBDA language defined in Part 1 of the
tutorial.

Let us copy the definition of LAMBDA from Part 1 of the tutorial, Lesson 8.
We are going to modify it into a type systems for LAMBDA.

Before we start, it is important to clarify an important detail, namely that
our type system will yield a type checker when executed, not a type
inferencer. In particular, we are going to change the LAMBDA syntax
to allow us to associate a type to each declared variable. The
constructs which declare variables are lambda, let, letrec and mu.
The syntax of all these will therefore change.

Since here we are not interested in a LAMBDA semantics anymore, we take the
freedom to eliminate the Val syntactic category, our previous results.
Our new results are going to be the types, because programs will now reduce
to their types.

As explained, the syntax of the lambda construct needs to change, to also
declare the type of the variable that it binds. We add the new syntactic
category Type, with the following constructs: int, bool, the function
type (which gives it its higher-order status), and parentheses as bracket.
Also, we make types our K results.

We are now ready to define the typing rules.

Let us start with the typing rule for lambda abstraction: lambda X : T . E
types to the function type T -> T', where T' is the type obtained by further
typing E[T/X]. This can be elegantly achieved by reducing the lambda
abstraction to T -> E[T/X], provided that we extend the function type construct
to take expressions, not only types, as arguments, and to be strict.
This can be easily achieved by redeclaring it as a strict expression construct
(strictness in the second argument would suffice in this example, but it is
more uniform to define it strict overall).

The typing rule for application is as simple as it can get: (T1->T2) T1 => T2.

Let us now give the typing rules of arithmetic and Boolean expression
constructs. First, let us get rid of Val. Second, rewrite each value to its
type, similarly to the type system for IMP++ in the previous lesson. Third,
replace each semantic rule by its typing rule. Fourth, make sure you
do not forget to subsort Type to Exp, so your rules above will parse.

The typing policy of the conditional statement is that its first argument
should type to bool and its other two arguments should type to the same type
T, which will also be the result type of the conditional. So we make the
conditional construct strict in all its three arguments and we write the
obvious rule: if bool then T:Type else T => T. We want a runtime check that
the latter arguments are actually typed, so we write T:Type.

There is nothing special about let, except that we have to make sure we
change its syntax to account for the type of the variable that it binds.
This rule is a macro, so the let is desugared statically.

Similarly, the syntax of letrec and mu needs to change to account for the
type of the variable that they bind. The typing of letrec remains based on
its desugaring to mu; we have to make sure the types are also included now.

The typing policy of mu is that its body should type to the same type T of
its variable, which is also the type of the entire mu expression. This can
be elegantly achieved by rewriting it to (T -> T) E[T/X]. Recall that
application is strict, so E[T/X] will be eventually reduced to its type.
Then the application types correctly only if that type is also T, and in
that case the result type will also be T.

kompile and krun some programs. You can, for example, take the LAMBDA
programs from the first tutorial, modify them by adding types to their
variable declarations, and then type check them using krun.

In the next lesson we will discuss an environment-based type system
for LAMBDA.

Go to Lesson 3, Type Systems: Environment-Based Higher-Order Type Systems.

MOVIE (out of date) [6'52"]

Environment-Based Higher-Order Type Systems

In this lesson you learn how to define an environment-based type system for
a higher-order language, namely the LAMBDA language defined in Part 1 of the
tutorial.

The simplest and fastest way to proceed is to copy the substitution-based
type system of LAMBDA from the previous lesson and modify it into an
environment-based one. A large portion of the substitution-based definition
will remain unchanged. We only have to modify the rules that use
substitution.

We do not need the substitution anymore, so we can remove the require and
import statements. The syntax of types and expressions stays unchanged, but
we can now remove the binder tag of lambda.

Like in the type system of IMP++ in Lesson 1, we need a configuration that
contains, besides the <k/> cell, a <tenv/> cell that will hold the type
environment.

In an environment-based definition, unlike in a substitution-based one, we
need to lookup variables in the environment. So let us start with the
type lookup rule:

rule <k> X:Id => T ...</k> <tenv>... X |-> T ...</k>

The type environment is populated by the semantic rule of lambda:

rule <k> lambda X : T . E => (T -> E) ~> Rho ...</k>
     <tenv> Rho => Rho[X <- T] </tenv>

So X is bound to its type T in the type environment, and then T -> E
is scheduled for processing. Recall that the arrow type construct has been
extended into a strict expression construct, so E will be eventually reduced
to its type. Like in other environment-based definitions, we need to make
sure that we recover the type environment after the computation in the scope
of the declared variable terminates.

The typing rule of application does not change, so it stays as elegant as it
was in the substitution-based definition:

rule (T1 -> T2) T1 => T2

So do the rules for arithmetic and Boolean constructs, and those for the
if, and let, and letrec.

The mu rule needs to change, because it was previously defined using
substitution. We modify it in the same spirit as we modified the lambda
rule: bind X to its type in the environment, schedule its body for typing
in its right context, and then recover the type environment.

Finally, we give the semantics of environment recovery, making sure
the environment is recovered only after the preceding computation is
reduced to a type:

rule _:Type ~> (Rho => .) ... _ => Rho

The changes that we applied to the substitution-based definition were
therefore quite systematic: each substitution invocation was replaced with
an appropriate type environment update/recovery.

Go to Lesson 4, Type Systems: A Naive Substitution-Based Type Inferencer.

A Naive Substitution-Based Type Inferencer

In this lesson you learn how to define a naive substitution-based type
inferencer for a higher-order language, namely the LAMBDA language
defined in Part 1 of the tutorial.

Unlike in the type checker defined in Lessons 2 and 3, where we had to
associate a type with each declared variable, a type inferencer
attempts to infer the types of all the variables from the way those
variables are used. Let us take a look at this program, say plus.lambda:

lambda x . lambda y . x + y

Since x and y are used in an integer addition context, we can infer
that they must have the type int and the result of the addition is
also an int, so the type of the entire expression is int -> int -> int.
Similarly, the program if.lambda

lambda x . lambda y . lambda z .
  if x then y else z

can only make sense when x has type bool and y and z have the same
type, say t, in which case the type of the entire expression is
bool -> t -> t -> t. Since the type t can be anything, we say that
the type of this expression is polymorphic. That means that the code
above can be used in different contexts, where t can be an int, a
bool, a function type int -> int, and so on.

In the identity.lambda program

let f = lambda x . x
in f 1

f has such a polymorphic type, which is then applied to an integer,
so this program is type-safe and its type is int.

A typical polymorphic expression is the composition

lambda f . lambda g . lambda x .
  g (f x)

which has the type (t1 -> t2) -> (t2 -> t3) -> (t1 -> t3), polymorphic
in 3 types.

Let us now define our naive type inferencer and then we discuss more
examples. The idea is quite simple: we conceptually do the same
operations like we did within the type checker defined in Lesson 2,
with two important differences:

  1. instead of declaring a type with each declared variable, we assume
    a fresh type for that variable; and
  2. instead of checking that the types of expressions satisfy the
    type properties of the context in which they are used, we impose
    those properties as type equality constraints. A general-purpose
    unification-based constraint solving mechanism is then used to solve
    the generated type constraints.

Let us start with the syntax, which is essentially identical to that
of the type checker in Lesson 2, except that bound variables are not
declared a type anymore. Also, to keep things more compact, we put
all the Exp syntax declarations in one syntax declaration this time.

Before we modify the rules, let us first define our machinery for
adding and solving constraints. First, we require and import the
unification procedure. We do not discuss unification here, but if you
are interested you can consult the unification.k files under
k-distribution/include/kframework/builtin, which contains our current generic
definition of unification, which is written also in K. The generic unification
provides a sort, Mgu, for most-general-unifier, an operation
updateMgu(Mgu,T1,T2) which updates Mgu with additional constraints
generated by forcing the terms T1 and T2 to be equal, and an operation
applyMgu(Mgu,T) which applies Mgu to term T. For our use
of unification here, we do not even need to know how Mgu terms are
represented internally.

We define a K item construct, =, which takes two Type terms and
enforces them to be equal by means of updating the current Mgu.
Once the constraints are added to the Mgu, the equality dissolves
itself. With this semantics of = in mind, we can now go ahead and
modify the rules of the type checker systematically into rules
for a type inferencer. The changes are self-explanatory and
mechanical: for example, the rule

rule int * int => int

changes into rule

rule T1:Type  * T2:Type => T1 = int ~> T2 = int ~> int

generating the constraints that the two arguments of multiplication
have the type int, and the result type is int. Recall that each type
equality on the <k/> cell updates the current Mgu appropriately and
then dissolves itself; thus, the above says that after imposing the
constraints T1=int and T2=int, multiplication yields a type int.

As mentioned above, since types of variables are not declared anymore,
but inferred, we have to generate a fresh type for each variable at its
declaration time, and then generate appropriately constraints for it.
For example, the type semantics of lambda and mu become:

rule lambda X . E => T -> E[T/X]  when fresh(T:Type)
rule mu X . E => (T -> T) E[T/X]  when fresh(T:Type)

that is, we add a condition stating that the previously declared type
is now a fresh one. This type will be further constrained by how the
variable X is being used within E.

Interestingly, the previous typing rule for lambda application is not
powerful enough anymore. Indeed, since types are not given anymore,
it may very well be the case that the inferred type of the first
argument of the application construct is not yet a function type
(remember, for example, the program composition.lambda above). What
we have to do is to enforce it to be a function type, by means of
fresh types and constraints. We can introduce a fresh type for the
result of the application, and then write the expected rule as
follows:

rule T1:Type T2:Type => T1 = (T2 -> T) ~> T  when fresh(T:Type)

The conditional requires that its first argument is a bool and its
second and third arguments have the same type, which is also the
result type.

The macros do not change, in particular let is desugared into lambda
application. We will next see that this is a significant restriction,
because it limits the polymorphism of our type system.

We are done. We have a working type inferencer for LAMBDA.

Let's kompile it and krun the programs above. They all work as
expected. Let us also try some additional programs, to push it to its
limits.

First, let us test mu by means of a letrec example:

letrec f x = 3
in f

We can also try all the programs that we had in our first tutorial, on
lambda, for example the factorial.imp program:

letrec f x = if x <= 1 then 1 else (x * (f (x + -1)))
in (f 10)

Those programs are simple enough that they should all work as
expected with our naive type inferencer here.

Let us next try to type some tricky programs, which involve more
complex and indirect type constraints.

tricky-1.lambda:

lambda f . lambda x . lambda y . (
  (f x y) + x + (let x = y in x)
)

tricky-2.lambda:

lambda x .
  let f = lambda y . if true then y else x
  in (lambda x . f 0)

tricky-3.lambda:

lambda x . let f = lambda y . if true then x 7 else x y
           in f

tricky-4.lambda:

lambda x . let f = lambda x . x
           in let d = (f x) + 1
              in x

tricky-5.lambda:

lambda x . let f = lambda y . x y
           in let z = x 0 in f

It is now time to see the limitations of this naive type inferencer.
Consider the program

let id = lambda x . x
in if (id true) then (id 1) else (id 2)

Our type inferencer fails graciously with a clash in the <mgu/> cell
between int and bool. Indeed, the desugaring macro of let turns it
into a lambda and an application, which further enforce id to have a
type of the form t -> t for some fresh type t. The first use of id
in the condition of if will then constrain t to be bool, while the
other uses in the two branches will enforce t to be int. Thus the
clash in the <mgu/> cell.

Similarly, the program

let id = lambda x . x
in id id

yields a different kind of conflict: if id has type t -> t, in order
to apply id to itself it must be the case that its argument, t, equals
t -> t. These two type terms cannot be unified because there is a
circular dependence on t, so we get a cycle in the <mgu/> cell.

Both limitations above will be solved when we change the semantics of
let later on, to account for the desired polymorphism.

Before we conclude this lesson, let us see one more interesting
example, where the lack of let-polymorphism leads not to a type error,
but to a less generic type:

let f1 = lambda x . x in
  let f2 = f1 in
    let f3 = f2 in
      let f4 = f3 in
        let f5 = f4 in
          if (f5 true) then f2 else f3

Our current type inferencer will infer the type bool -> bool for the
program above. Nevertheless, since all functions f1, f2, f3, f4, f5
are the identity function, which is polymorphic, we would expect the
entire program to type to the same polymorphic identity function type.

This limitation will be also addressed when we define our
let-polymorphic type inferencer.

Before that, in the next lesson we will show how easily we can turn
the naive substitution-based type inferencer discussed in this lesson
into a similarly naive, but environment-based type inferencer.

Go to Lesson 5, Type Systems: A Naive Environment-Based Type Inferencer.

A Naive Environment-Based Type Inferencer

In this lesson you learn how to define a naive environment-based type
inferencer for a higher-order language. Specifically, we take the
substitution-based type inferencer for LAMBDA defined in Lesson 4 and
turn it into an environment-based one.

Recall from Lesson 3, where we defined an environment-based type
checker for LAMBDA based on the substitution-based one in Lesson 2,
that the transition from a substitution-based definition to an
environment-based one was quite systematic and mechanical: each
substitution occurrence E[T/X] is replaced by E, but at the same time
the variable X is bound to type T in the type environment. One benefit
of using type environments instead of substitution is that we replace
a linear complexity operation (the substitution) with a constant
complexity one (the variable lookup).

There is not much left to say which has not been already said in
Lesson 3: we remove the unnecessary binder annotations for the
variable binding operations, then add a <tenv/> cell to the
configuration to hold the type environment, then add a new rule for
variable lookup, and finally apply the transformation of substitutions
E[T/X] into E as explained above.

The resulting type inferencer should now work exactly the same way as
the substitution-based one, except, of course, that the resulting
configurations will contain a <tenv/> cell now.

As sanity check, let us consider two more LAMBDA programs that test
the static scoping nature of the inferencer. We do that because
faulty environment-based definitions often have this problem. The
program

let x = 1
in let f = lambda a . x
   in let x = true
      in f 3

should type to int, not to bool, and so it does. Similarly, the
program

let y = 0
in letrec f x = if x <= 0
                then y
                else let y = true
                     in f (x + 1)
   in f 1

should also type to int, not bool, and so it does, too.

The type inferencer defined in this lesson has the same limitations,
in terms of polymorphism, as the one in Lesson 4. In the next
lesson we will see how it can be parallelized, and in further lessons
how to make it polymorphic.

Go to Lesson 6, Type Systems: Parallel Type Checkers/Inferencers.

Parallel Type Checkers/Inferencers

In this lesson you learn how to define parallel type checkers or
inferencers. For the sake of a choice, we will parallelize the one in
the previous lesson, but the ideas are general. We are using the same
idea to define type checkers for other languages in the K tool
distribution, such as SIMPLE and KOOL.

The idea is in fact quite simple. Instead of one monolithic typing
task, we generate many smaller tasks, which can be processed in
parallel. We use the same approach to define parallel semantics as we
used for threads in IMP++ in Part 4 of the tutorial, that is, we add a
cell holding all the parallel tasks, making sure we declare the cell
holding a task with multiplicity *. For the particular type
inferencer that we chose here, the one in Lesson 5, each task will
hold an expression to type together with a type environment (so it
knows where to lookup its free variables). We have the following
configuration then:

configuration <tasks color="yellow">
                <task color="orange" multiplicity="*">
                  <k color="green"> $PGM:Exp </k>
                  <tenv color="red"> .Map </tenv>
                </task>
              </tasks>
              <mgu color="blue"> .Mgu </mgu>

Now we have to take each typing rule we had before and change it to
yield parallel typing. For example, our rule for typing
multiplication was the following in Lesson 5:

rule T1:Type * T2:Type => T1 = int ~> T2 = int ~> int

Since * was strict, its two arguments eventually type, and once that
happens the rule above fires. Unfortunately, the strictness of
multiplication makes the typing of the two expressions sequential in
our previous definition. To avoid typing the two expressions
sequentially and instead generating two parallel tasks, we remove the
strict attribute of multiplication and replace the rule above with the
following:

rule <k> E1 * E2 => int ...</k> <tenv> Rho </tenv>
     (. => <task> <k> E1 = int </k> <tenv> Rho </tenv> </task>
           <task> <k> E2 = int </k> <tenv> Rho </tenv> </task>)

Therefore, we generate two tasks for typing E1 and E2 in the same type
environment as the current task, and let the current task continue by
simply optimistically reducing E1*E2 to its expected result type, int.
If E1 or E2 will not type to int, then either their corresponding
tasks will get stuck or the <mgu/> cell will result into a clash or cycle,
so the program will not type overall in spite of the fact that we
allowed the task containing the multiplication to continue. This is
how we get maximum of parallelism in this case.

Before we continue, note that the new tasks hold equalities in them,
where one of its arguments is an expression, while previously the
equality construct was declared to take types. What we want now is
for the equality construct to possibly take any expressions, and first
type them and then generate the type constraint like before. This can
be done very easily by just extending the equality construct to
expressions and declaring it strict:

syntax KItem ::= Exp "=" Exp  [strict]

Unlike before, where we only passed types to the equality construct,
we now need a runtime check that its arguments are indeed types before
we can generate the updateMgu command:

rule <k> T:Type = T':Type => . ...</k>
     <mgu> Theta:Mgu => updateMgu(Theta,T,T') </mgu>

Like before, an equality will therefore update the <mgu/> cell and then
it dissolves itself, letting the <k/> cell in the corresponding task
empty. Such empty tasks are unnecessary, so they can be erased:

rule <task>... <k> . </k> ...</task> => .

We can now follow the same style as for multiplication to write the
parallel typing rules of the other arithmetic constructs, and even for
the conditional.

To parallelize the typing of lambda we generate two fresh types, one
for the variable and one for the body, and make sure that we generate
the correct type constraint and environment in the body task:

rule <k> lambda X . E => Tx -> Te ...</k> <tenv> TEnv </tenv>
     (. => <task> <k> E = Te </k> <tenv> TEnv[Tx/X] </tenv> </task>)
  when fresh(Tx:Type) andBool fresh(Te:Type)

Note that the above also allows us to not need to change and then
recover the environment of the current cell.

For function application we also need to generate two fresh types:

rule <k> E1 E2 => T ...</k> <tenv> Rho </tenv>
     (. => <task> <k> E1 = T2 -> T </k> <tenv> Rho </tenv> </task>
           <task> <k> E2 = T2 </k> <tenv> Rho </tenv> </task>)
  when fresh(T2:Type) andBool fresh(T:Type)

The only rule left is that of mu X . E. In this case we only need one
fresh type, because X, E and mu X . E have all the same type:

rule <k> mu X . E => T ...</k>  <tenv> TEnv </tenv>
     (. => <task> <k> E = T </k> <tenv> TEnv[T/X] </tenv> </task>)
  when fresh(T:Type)

We do not need the type environment recovery operation, so we delete it.

We can now kompile and krun all the programs that we typed in Lesson 5.
Everything should work.

In this lesson we only aimed at parallelizing the type inferencer in
Lesson 5, not to improve its expressiveness; it still has the same
limitations in terms of polymorphism. The next lessons are dedicated
to polymorphic type inferencers.

Go to Lesson 7, Type Systems: A Naive Substitution-based Polymorphic Type Inferencer.

A Naive Substitution-based Polymorphic Type Inferencer

In this lesson you learn how little it takes to turn a naive monomorphic
type inferencer into a naive polymorphic one, basically only changing
a few characters. In terms of the K framework, you will learn that
you can have complex combinations of substitutions in K, both over
expressions and over types.

Let us start directly with the change. All we have to do is to take
the LAMBDA type inferencer in Lesson 4 and only change the macro

rule let X = E in E' => (lambda X . E') E  [macro]

as follows:

rule let X = E in E' => E'[E/X]  [macro]

In other words, we are inlining the beta-reduction rule of
lambda-calculus within the original rule. In terms of typing,
the above forces the type inferencer to type E in place for each
occurrence of X in E'. Unlike in the first rule, where X had to get
one type only which satisfied the constrains of all X's occurrences in
E', we now never associate any type to X anymore.

Let us kompile and krun some examples. Everything that worked with
the type inferencer in Lesson 4 should still work here, although the
types of some programs can now be more general. For example, reconsider
the nested-lets.lambda program

let f1 = lambda x . x in
  let f2 = f1 in
    let f3 = f2 in
      let f4 = f3 in
        let f5 = f4 in
          if (f5 true) then f2 else f3

which was previously typed to bool -> bool. With the new rule above,
the sequence of lets is iteratively eliminated and we end up with the
program

if (lambda x . x) true then (lambda x . x) else (lambda x . x)

which now types (with both type inferencers) to a type of the form
t -> t, for some type variable t, which is more general than the
previous bool -> bool type that the program typed to in Lesson 4.

We can also now type programs that were not typable before, such as

let id = lambda x . x
in if (id true) then (id 1) else (id 2)

and

let id = lambda x . x
in id id

Let us also test it on some trickier programs, also not typable
before, such as

let f = lambda x . x
in let g = lambda y . f y
   in g g

which gives us a type of the form t -> t for some type variable t,
and as

let f = let g = lambda x . x
        in let h = lambda x . lambda x . (g g g g)
           in h
in f

which types to t1 -> t2 -> t3 -> t3 for some type variables t1, t2, t3.

Here is another program which was not typable before, which is
trickier than the others above in that a lambda-bound variable appears
free in a let-bound expression:

lambda x . (
  let y = lambda z . x
  in if (y true) then (y 1) else (y (lambda x . x))
)

The above presents no problem now, because once lambda z . x gets
substituted for y we get a well-typed expression which yields that x
has the type bool, so the entire expression types to bool -> bool.

The cheap type inferencer that we obtained above therefore works as
expected. However, it has two problems which justify a more advanced
solution. First, substitution is typically considered an elegant
mathematical instrument which is not too practical in implementations,
so an implementation of this type inferencer will likely be based on
type environments anyway. Additionally, we mix two kinds of
substitutions in this definition, one where we substitute types and
another where we substitute expressions, which can only make things
harder to implement efficiently. Second, our naive substitution of E
for X in E' can yield an exponential explosion in size of the original
program. Consider, for example, the following classic example which
is known to generate a type whose size is exponential in the size of
the program (and is thus used as an argument for why let-polymorphic
type inference is exponential in the worst-case):

let f00 = lambda x . lambda y . x in
  let f01 = lambda x . f00 (f00 x) in
    let f02 = lambda x . f01 (f01 x) in
      let f03 = lambda x . f02 (f02 x) in
        let f04 = lambda x . f03 (f03 x) in
          // ... you can add more nested lets here
          f04

The particular instance of the pattern above generates a type which
has 17 type variables! The desugaring of each let doubles the size of
the program and of its resulting type. While such programs are little
likely to appear in practice, it is often the case that functions can
be quite complex and large while their type can be quite simple in the
end, so we should simply avoid retyping each function each time it is
used.

This is precisely what we will do next. Before we present the classic
let-polymorphic type inferencer in Lesson 9, which is based on
environments, we first quickly discuss in Lesson 8 an intermediate
step, namely a naive environment-based variant of the inferencer
defined here.

Go to Lesson 8, Type Systems: A Naive Environment-based Polymorphic Type Inferencer.

A Naive Environment-based Polymorphic Type Inferencer

In this short lesson we discuss how to quickly turn a naive
environment-based monomorphic type inferencer into a naive let-polymorphic
one. Like in the previous lesson, we only need to change a few
characters. In terms of the K framework, you will learn how to have
both environments and substitution in the same definition.

Like in the previous lesson, all we have to do is to take the LAMBDA
type inferencer in Lesson 5 and only change the macro

rule let X = E in E' => (lambda X . E') E  [macro]

as follows:

rule let X = E in E' => E'[E/X]  [macro]

The reasons why this works have already been explained in the previous
lesson, so we do not repeat them here.

Since our new let macro uses substitution, we have to require the
substitution module at the top and also import SUBSTITUTION in the
current module, besides the already existing UNIFICATION.

Everything which worked with the type inferencer in Lesson 7 should
also work now. Let us only try the exponential type example,

let f00 = lambda x . lambda y . x in
  let f01 = lambda x . f00 (f00 x) in
    let f02 = lambda x . f01 (f01 x) in
      let f03 = lambda x . f02 (f02 x) in
        let f04 = lambda x . f03 (f03 x) in
          f04

As expected, this gives us precisely the same type as in Lesson 7.

So the only difference between this type inferencer and the one in
Lesson 7 is that substitution is only used for LAMBDA-to-LAMBDA
transformations, but not for infusing types within LAMBDA programs.
Thus, the syntax of LAMBDA programs is preserved intact, which some
may prefer. Nevertheless, this type inferencer is still expensive and
wasteful, because the let-bound expression is typed over and over
again in each place where the let-bound variable occurs.

In the next lesson we will discuss a type inferencer based on the
classic Damas-Hindley-Milner type system, which maximizes the reuse of
typing work by means of parametric types.

Go to Lesson 9, Type Systems: Let-Polymorphic Type Inferencer (Damas-Hindley-Milner).

Let-Polymorphic Type Inferencer (Damas-Hindley-Milner)

In this lesson we discuss a type inferencer based on what we call today
the Damas-Hindley-Milner type system, which is at the core of many
modern functional programming languages. The first variant of it was
proposed by Hindley in 1969, then, interestingly, Milner rediscovered
it in 1978 in the context of the ML language. Damas formalized it as
a type system in his PhD thesis in 1985. More specifically, our type
inferencer here, like many others as well as many implementations of
it, follows more closely the syntax-driven variant proposed by Clement
in 1987.

In terms of K, we will see how easily we can turn one definition which
is considered naive (our previous type inferencer in Lesson 8) into a
definition which is considered advanced. All we have to do is to
change one existing rule (the rule of the let binder) and to add a new
one. We will also learn some new predefined features of K, which make
the above possible.

The main idea is to replace the rule

rule let X = E in E' => E'[E/X]  [macro]

which creates potentially many copies of E within E' with a rule
which types E once and then reuses that type in each place where X
occurs free in E'. The simplest K way to type E is to declare the
let construct strict(2). Now we cannot simply bind X to the type
of E, because we would obtain a variant of the naive type inferencer
we already discussed, together with its limitations, in Lesson 5 of this
tutorial. The trick here is to parameterize the type of E in all its
unconstrained fresh types, and then create fresh copies of those
parameters in each free occurrence of X in E'.

Let us discuss some examples, before we go into the technical details.
Consider the first let-polymorphic example which failed to be typed
with our first naive type-inferencer:

let id = lambda x . x
in if (id true) then (id 1) else (id 2)

When typing lambda x . x, we get a type of the form t -> t, for some
fresh type t. Instead of assigning this type to id as we did in the
naive type inferencers, we now first parametrize this type in its
fresh variable t, written

(forall t) t -> t

and then bind id to this parametric type. The intuition for the
parameter is that it can be instantiated with any other type, so this
parametric type stands, in fact, for infinitely many non-parametric
types. This is similar to what happens in formal logic proof systems,
where rule schemas stand for infinitely many concrete instances of
them. For this reason, parametric types are also called type schemas.

Now each time id is looked up within the let-body, we create a fresh
copy of the parameter t, which can this way be independently
constrained by each local context. Let's suppose that the three id
lookups yield the types t1 -> t1, t2 -> t2, and respectively t3 -> t3.
Then t1 will be constrained to be bool, and t2 and t3 to be int,
so we can now safely type the program above to int.

Therefore, a type schema comprises a summary of all the typing work
that has been done for typing the corresponding expression, and an
instantiation of its parameters with fresh copies represents an
elegant way to reuse all that typing work.

There are some subtleties regarding what fresh types can be made
parameters. Let us consider another example, discussed as part of
Lesson 7 on naive let-polymorphism:

lambda x . (
  let y = lambda z . x
  in if (y true) then (y 1) else (y (lambda x . x))
)

This program should type to bool -> bool, as explained in Lesson 7.
The lambda construct will bind x to some fresh type tx. Then the
let-bound expression lambda z . x types to tz -> tx for some
additional fresh type tz. The question now is what should the
parameters of this type be when we generate the type schema? If we
naively parameterize in all fresh variables, that is in both tz and
tx obtaining the type schema (forall tz,tx) tz -> tx, then there will
be no way to infer that the type of x, tx, must be a bool! The
inferred type of this expression would then wrongly be tx -> t for
some fresh types tx and t. That's because the parameters are replaced
with fresh copies in each occurrence of y, and thus their relationship
to the original x is completely lost. This tells us that we cannot
parameterize in all fresh types that appear in the type of the
let-bound expression. In particular, we cannot parameterize in those
which some variables are already bound to in the current type
environment (like x is bound to tx in our example above).
In our example, the correct type schema is (forall tz) tz -> tx,
which now allows us to correctly infer that tx is bool.

Let us now discuss another example, which should fail to type:

lambda x .
  let f = lambda y . x y
  in if (f true) then (f 1) else (f 2)

This should fail to type because lambda y . x y is equivalent to x,
so the conditional imposes the conflicting constraints that x should be
a function whose argument is either a bool or an int. Let us try to
type it using our currently informal procedure. Like in the previous
example, x will be bound to a fresh type tx. Then the let-bound
expression types to ty -> tz with ty and tz fresh types, adding also
the constraint tx = ty -> tz. What should the parameters of this type
be? If we ignore the type constraint and simply make both ty and tz
parameters because no variable is bound to them in the type
environment (indeed, the only variable x in the type environment is
bound to tx), then we can wrongly type this program to tx -> tz
following a reasoning similar to the one in the example above.
In fact, in this example, none of ty and tz can be parameters, because
they are constrained by tx.

The examples above tell us two things: first, that we have to take the
type constraints into account when deciding the parameters of the
schema; second, that after applying the most-general-unifier solution
given by the type constraints everywhere, the remaining fresh types
appearing anywhere in the type environment are consequently constrained
and cannot be turned into parameters. Since the type environment can in
fact also hold type schemas, which already bind some types, we only need
to ensure that none of the fresh types appearing free anywhere in the
type environment are turned into parameters of type schemas.

Thanks to generic support offered by the K tool, we can easily achieve
all the above as follows.

First, add syntax for type schemas:

syntax TypeSchema ::= "(" "forall" Set ")" Type  [binder]

The definition below will be given in such a way that the Set argument
of a type schema will always be a set of fresh types. We also declare
this construct to be a binder, so that we can make use of the generic
free variable function provided by the K tool.

We now replace the old macro of let

rule let X = E in E' => E'[E/X]  [macro]

with the following rule:

rule <k> let X = T:Type in E => E ~> tenv(TEnv) ...</k>
     <mgu> Theta:Mgu </mgu>
     <tenv> TEnv
      => TEnv[(forall freeVariables(applyMgu(Theta, T)) -Set
                      freeVariables(applyMgu(Theta, values TEnv))
              ) applyMgu(Theta, T) / X]
     </tenv>

So the type T of E is being parameterized and then bound to X in the
type environment. The current mgu Theta, which comprises all the type
constraints accumulated so far, is applied to both T and the types in
the type environment. The remaining fresh types in T which do not
appear free in the type environment are then turned into type parameters.
The function freeVariables returns, as expected, the free variables of
its argument as a Set; this is why we declared the type schema to be a
binder above.

Now a LAMBDA variable in the type environment can be bound to either a
type or a type schema. In the first case, the previous rule we had
for variable lookup can be reused, but we have to make sure we check
that T there is of sort Type (adding a sort membership, for example).
In the second case, as explained above, we have to create fresh copies
of the parameters. This can be easily achieved with another
predefined K function, as follows:

rule <k> X:Id => freshVariables(Tvs,T) ...</k>
     <tenv>... X |-> (forall Tvs) T ...</tenv>

Indeed, freshVariables takes a set of variables and a term, and returns the
same term but with each of the given variables replaced by a fresh copy.

The operations freeVariables and freshVariables are useful in many K
definitions, so they are predefined in module substitution.k.

Our definition of this let-polymorphic type inferencer is now
complete. To test it, kompile it and then krun all the LAMBDA
programs discussed since Lesson 4. They should all work as expected.

K Languages

Here we present several "real-world" language examples. These languages
demonstrate many of the features you would expect to find in a full-fledged
programming language.

  • SIMPLE: Imperative programming language with threads.
  • KOOL: SIMPLE extended with object-oriented features.
  • FUN: A functional language with algebraic data-types and pattern-matching.
  • LOGIK: A logical programming language based on clause unification.

SIMPLE — Untyped

Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign

Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest

Abstract

This is the K semantic definition of the untyped SIMPLE language.
SIMPLE is intended to be a pedagogical and research language that captures
the essence of the imperative programming paradigm, extended with several
features often encountered in imperative programming languages.
A program consists of a set of global variable declarations and
function definitions. Like in C, function definitions cannot be
nested and each program must have one function called main,
which is invoked when the program is executed. To make it more
interesting and to highlight some of K's strengths, SIMPLE includes
the following features in addition to the conventional imperative
expression and statement constructs:

  • Multidimensional arrays and array references. An array evaluates
    to an array reference, which is a special value holding a location (where
    the elements of the array start) together with the size of the array;
    the elements of the array can be array references themselves (particularly
    when the array is multi-dimensional). Array references are ordinary values,
    so they can be assigned to variables and passed/received by functions.

  • Functions and function values. Functions can have zero or
    more parameters and can return abruptly using a return statement.
    SIMPLE follows a call-by-value parameter passing style, with static scoping.
    Function names evaluate to function abstractions, which hereby become ordinary
    values in the language, same like the array references.

  • Blocks with locals. SIMPLE variables can be declared
    anywhere, their scope being from the place where they are declared
    until the end of the most nested enclosing block.

  • Input/Output. The expression read() evaluates to the
    next value in the input buffer, and the statement write(e)
    evaluates e and outputs its value to the output buffer. The
    input and output buffers are lists of values.

  • Exceptions. SIMPLE has parametric exceptions (the value thrown as
    an exception can be caught and bound).

  • Concurrency via dynamic thread creation/termination and
    synchronization. One can spawn a thread to execute any statement.
    The spawned thread shares with its parent its environment at creation time.
    Threads can be synchronized via a join command which blocks the current thread
    until the joined thread completes, via re-entrant locks which can be acquired
    and released, as well as through rendezvous commands.

Like in many other languages, some of SIMPLE's constructs can be
desugared into a smaller set of basic constructs. We do that at the end
of the syntax module, and then we only give semantics to the core constructs.

Note: This definition is commented slightly more than others, because it is
intended to be one of the first non-trivial definitions that the new
user of K sees. We recommend the beginner user to first check the
language definitions discussed in the K tutorial.

module SIMPLE-UNTYPED-SYNTAX
  imports DOMAINS-SYNTAX

Syntax

We start by defining the SIMPLE syntax. The language constructs discussed
above have the expected syntax and evaluation strategies. Recall that in K
we annotate the syntax with appropriate strictness attributes, thus giving
each language construct the desired evaluation strategy.

Identifiers

Recall from the K tutorial that identifiers are builtin and come under the
syntactic category Id. The special identifier for the function
main belongs to all programs, and plays a special role in the semantics,
so we declare it explicitly. This would not be necessary if the identifiers
were all included automatically in semantic definitions, but that is not
possible because of parsing reasons (e.g., K variables used to match
concrete identifiers would then be ambiguously parsed as identifiers). They
are only included in the parser generated to parse programs (and used by the
kast tool). Consequently, we have to explicitly declare all the
concrete identifiers that play a special role in the semantics, like
main below.

  syntax Id ::= "main" [token]

Declarations

There are two types of declarations: for variables (including arrays) and
for functions. We are going to allow declarations of the form
var x=10, a[10,10], y=23;, which is why we allow the var
keyword to take a list of expressions. The non-terminals used in the two
productions below are defined shortly.

  syntax Stmt ::= "var" Exps ";"
                | "function" Id "(" Ids ")" Block

Expressions

The expression constructs below are standard. Increment (++) takes
an expression rather than a variable because it can also increment an array
element. Recall that the syntax we define in K is what we call the syntax
of the semantics
: while powerful enough to define non-trivial syntaxes
(thanks to the underlying SDF technology that we use), we typically refrain
from defining precise syntaxes, that is, ones which accept precisely the
well-formed programs (that would not be possible anyway in general). That job
is deferred to type systems, which can also be defined in K. In other words,
we are not making any effort to guarantee syntactically that only variables
or array elements are passed to the increment construct, we allow any
expression. Nevertheless, we will only give semantics to those, so expressions
of the form ++5, which parse (but which will be rejected by our type
system in the typed version of SIMPLE later), will get stuck when executed.
Arrays can be multidimensional and can hold other arrays, so their
lookup operation takes a list of expressions as argument and applies to an
expression (which can in particular be another array lookup), respectively.
The construct sizeOf gives the size of an array in number of elements
of its first dimension. Note that almost all constructs are strict. The only
constructs which are not strict are the increment (since its first argument
gets updated, so it cannot be evaluated), the input read which takes no
arguments so strictness is irrelevant for it, the logical and and or constructs
which are short-circuited, the thread spawning construct which creates a new
thread executing the argument expression and return its unique identifier to
the creating thread (so it cannot just evaluate its argument in place), and the
assignment which is only strict in its second argument (for the same reason as
the increment).

  syntax Exp ::= Int | Bool | String | Id
               | "(" Exp ")"             [bracket]
               | "++" Exp
               > Exp "[" Exps "]"        [strict]
               > Exp "(" Exps ")"        [strict]
               | "-" Exp                 [strict]
               | "sizeOf" "(" Exp ")"    [strict]
               | "read" "(" ")"
               > left:
                 Exp "*" Exp             [strict, left]
               | Exp "/" Exp             [strict, left]
               | Exp "%" Exp             [strict, left]
               > left:
                 Exp "+" Exp             [strict, left]
               | Exp "-" Exp             [strict, left]
               > non-assoc:
                 Exp "<" Exp             [strict, non-assoc]
               | Exp "<=" Exp            [strict, non-assoc]
               | Exp ">" Exp             [strict, non-assoc]
               | Exp ">=" Exp            [strict, non-assoc]
               | Exp "==" Exp            [strict, non-assoc]
               | Exp "!=" Exp            [strict, non-assoc]
               > "!" Exp                 [strict]
               > left:
                 Exp "&&" Exp            [strict(1), left]
               | Exp "||" Exp            [strict(1), left]
               > "spawn" Block
               > Exp "=" Exp             [strict(2), right]

We also need comma-separated lists of identifiers and of expressions.
Moreover, we want them to be strict, that is, to evaluate to lists of results
whenever requested (e.g., when they appear as strict arguments of
the constructs above).

  syntax Ids  ::= List{Id,","}
  syntax Exps ::= List{Exp,","}          [strict]  // automatically hybrid now
  syntax Exps ::= Ids
  syntax Val
  syntax Vals ::= List{Val,","}
  syntax Bottom
  syntax Bottoms ::= List{Bottom,","}
  syntax Ids ::= Bottoms

Statements

Most of the statement constructs are standard for imperative languages.
We syntactically distinguish between empty and non-empty blocks, because we
chose Stmts not to be a (;-separated) list of
Stmt. Variables can be declared anywhere inside a block, their scope
ending with the block. Expressions are allowed to be used for their side
effects only (followed by a semicolon ;). Functions are allowed
to abruptly return. The exceptions are parametric, i.e., one can throw a value
which is bound to the variable declared by catch. Threads can be
dynamically created and terminated, and can synchronize with join,
acquire, release and rendezvous. Note that the
strictness attributes obey the intended evaluation strategy of the various
constructs. In particular, the if-then-else construct is strict only in its
first argument (the if-then construct will be desugared into if-then-else),
while the loop constructs are not strict in any arguments. The print
statement construct is variadic, that is, it takes an arbitrary number of
arguments.

  syntax Block ::= "{" "}"
                | "{" Stmt "}"

  syntax Stmt ::= Block
                | Exp ";"                               [strict]
                | "if" "(" Exp ")" Block "else" Block   [avoid, strict(1)]
                | "if" "(" Exp ")" Block
                | "while" "(" Exp ")" Block
                | "for" "(" Stmt Exp ";" Exp ")" Block
                | "return" Exp ";"                      [strict]
                | "return" ";"
                | "print" "(" Exps ")" ";"              [strict]
// NOTE: print strict allows non-deterministic evaluation of its arguments
// Either keep like this but document, or otherwise make Exps seqstrict.
// Of define and use a different expression list here, which is seqstrict.
                | "try" Block "catch" "(" Id ")" Block
                | "throw" Exp ";"                       [strict]
                | "join" Exp ";"                        [strict]
                | "acquire" Exp ";"                     [strict]
                | "release" Exp ";"                     [strict]
                | "rendezvous" Exp ";"                  [strict]

The reason we allow Stmts as the first argument of for
instead of Stmt is because we want to allow more than one statement
to be executed when the loop is initialized. Also, as seens shorly, macros
may expand one statement into more statements; for example, an initialized
variable declaration statement var x=0; desugars into two statements,
namely var x; x=0;, so if we use Stmt instead of Stmts
in the production of for above then we risk that the macro expansion
of statement var x=0; happens before the macro expansion of for,
also shown below, in which case the latter would not apply anymore because
of syntactic mismatch.

  syntax Stmt ::= Stmt Stmt                          [right]

// I wish I were able to write the following instead, but confuses the parser.
//
// syntax Stmts ::= List{Stmt,""}
// syntax Top ::= Stmt | "function" Id "(" Ids ")" Block
// syntax Pgm ::= List{Top,""}
//
// With that, I could have also eliminated the empty block

Desugared Syntax

This part desugars some of SIMPLE's language constructs into core ones.
We only want to give semantics to core constructs, so we get rid of the
derived ones before we start the semantics. All desugaring macros below are
straightforward.

  rule if (E) S => if (E) S else {}                                 [macro]
  rule for(Start Cond; Step) {S} => {Start while (Cond) {S Step;}}  [macro]
  rule for(Start Cond; Step) {} => {Start while (Cond) {Step;}}     [macro]
  rule var E1:Exp, E2:Exp, Es:Exps; => var E1; var E2, Es;          [macro-rec]
  rule var X:Id = E; => var X; X = E;                               [macro]

For the semantics, we can therefore assume from now on that each
conditional has both branches, that there are only while loops, and
that each variable is declared alone and without any initialization as part of
the declaration.

endmodule


module SIMPLE-UNTYPED
  imports SIMPLE-UNTYPED-SYNTAX
  imports DOMAINS

Basic Semantic Infrastructure

Before one starts adding semantic rules to a K definition, one needs to
define the basic semantic infrastructure consisting of definitions for
values and configuration. As discussed in the definitions
in the K tutorial, the values are needed to know when to stop applying
the heating rules and when to start applying the cooling rules corresponding
to strictness or context declarations. The configuration serves as a backbone
for the process of configuration abstraction which allows users to only
mention the relevant cells in each semantic rule, the rest of the configuration
context being inferred automatically. Although in some cases the configuration
could be automatically inferred from the rules, we believe that it is very
useful for language designers/semanticists to actually think of and design
their configuration explicitly, so the current implementation of K requires
one to define it.

Values

We here define the values of the language that the various fragments of
programs evaluate to. First, integers and Booleans are values. As discussed,
arrays evaluate to special array reference values holding (1) a location from
where the array's elements are contiguously allocated in the store, and
(2) the size of the array. Functions evaluate to function values as
λ-abstractions (we do not need to evaluate functions to closures
because each function is executed in the fixed global environment and
function definitions cannot be nested). Like in IMP and other
languages, we finally tell the tool that values are K results.

  syntax Val ::= Int | Bool | String
               | array(Int,Int)
               | lambda(Ids,Stmt)
  syntax Exp ::= Val
  syntax Exps ::= Vals
  syntax Vals ::= Bottoms
  syntax KResult ::= Val
                   | Vals  // TODO: should not need this

The inclusion of values in expressions follows the methodology of
syntactic definitions (like, e.g., in SOS): extend the syntax of the language
to encompass all values and additional constructs needed to give semantics.
In addition to that, it allows us to write the semantic rules using the
original syntax of the language, and to parse them with the same (now extended
with additional values) parser. If writing the semantics directly on the K
AST, using the associated labels instead of the syntactic constructs, then one
would not need to include values in expressions.

Configuration

The K configuration of SIMPLE consists of a top level cell, T,
holding a threads cell, a global environment map cell genv
mapping the global variables and function names to their locations, a shared
store map cell store mapping each location to some value, a set cell
busy holding the locks which have been acquired but not yet released
by threads, a set cell terminated holding the unique identifiers of
the threads which already terminated (needed for join), input
and output list cells, and a nextLoc cell holding a natural
number indicating the next available location. Unlike in the small languages
in the K tutorial, where we used the fresh predicate to generate fresh
locations, in larger languages, like SIMPLE, we prefer to explicitly manage
memory. The location counter in nextLoc models an actual physical
location in the store; for simplicity, we assume arbitrarily large memory and
no garbage collection. The threads cell contains one thread
cell for each existing thread in the program. Note that the thread cell has
multiplicity *, which means that at any given moment there could be zero,
one or more thread cells. Each thread cell contains a
computation cell k, a control cell holding the various
control structures needed to jump to certain points of interest in the program
execution, a local environment map cell env mapping the thread local
variables to locations in the store, and finally a holds map cell
indicating what locks have been acquired by the thread and not released so far
and how many times (SIMPLE's locks are re-entrant). The control cell
currently contains only two subcells, a function stack fstack which
is a list and an exception stack xstack which is also a list.
One can add more control structures in the control cell, such as a
stack for break/continue of loops, etc., if the language is extended with more
control-changing constructs. Note that all cells except for k are
also initialized, in that they contain a ground term of their corresponding
sort. The k cell is initialized with the program that will be passed
to the K tool, as indicated by the $PGM variable, followed by the
execute task (defined shortly).

  // the syntax declarations below are required because the sorts are
  // referenced directly by a production and, because of the way KIL to KORE
  // is implemented, the configuration syntax is not available yet
  // should simply work once KIL is removed completely
  // check other definitions for this hack as well

  syntax ControlCell
  syntax ControlCellFragment

  configuration <T color="red">
                  <threads color="orange">
                    <thread multiplicity="*" color="yellow">
                      <k color="green"> $PGM:Stmt ~> execute </k>
                    //<br/> // TODO(KORE): support latex annotations #1799
                      <control color="cyan">
                        <fstack color="blue"> .List </fstack>
                        <xstack color="purple"> .List </xstack>
                      </control>
                    //<br/> // TODO(KORE): support latex annotations #1799
                      <env color="violet"> .Map </env>
                      <holds color="black"> .Map </holds>
                      <id color="pink"> 0 </id>
                    </thread>
                  </threads>
                //<br/> // TODO(KORE): support latex annotations #1799
                  <genv color="pink"> .Map </genv>
                  <store color="white"> .Map </store>
                  <busy color="cyan"> .Set </busy>
                  <terminated color="red"> .Set </terminated>
                //<br/> // TODO(KORE): support latex annotations #1799
                  <input color="magenta" stream="stdin"> .List </input>
                  <output color="brown" stream="stdout"> .List </output>
                  <nextLoc color="gray"> 0 </nextLoc>
                </T>

Declarations and Initialization

We start by defining the semantics of declarations (for variables,
arrays and functions).

Variable Declaration

The SIMPLE syntax was desugared above so that each variable is
declared alone and its initialization is done as a separate statement.
The semantic rule below matches resulting variable declarations of the
form var X; on top of the k cell
(indeed, note that the k cell is complete, or round, to the
left, and is torn, or ruptured, to the right), allocates a fresh
location L in the store which is initialized with a special value
(indeed, the unit ., or nothing, is matched anywhere
in the map ‒note the tears at both sides‒ and replaced with the
mapping L ↦ ⊥), and binds X to L in the local
environment shadowing previous declarations of X, if any.
This possible shadowing of X requires us to therefore update the
entire environment map, which is expensive and can significantly slow
down the execution of larger programs. On the other hand, since we know
that L is not already bound in the store, we simply add the binding
L ↦ ⊥ to the store, thus avoiding a potentially complete
traversal of the the store map in order to update it. We prefer the approach
used for updating the store whenever possible, because, in addition to being
faster, it offers more true concurrency than the latter; indeed, according
to the concurrent semantics of K, the store is not frozen while
L ↦ ⊥ is added to it, while the environment is frozen during the
update operation Env[L/X]. The variable declaration command is
also removed from the top of the computation cell and the fresh location
counter is incremented. The undefined symbol added in the store
is of sort KItem, instead of Val, on purpose; this way, the
store lookup rules will get stuck when one attempts to lookup an
uninitialized location. All the above happen in one transactional step,
with the rule below. Note also how configuration abstraction allows us to
only mention the needed cells; indeed, as the configuration above states,
the k and env cells are actually located within a
thread cell within the threads cell, but one needs
not mention these: the configuration context of the rule is
automatically transformed to match the declared configuration
structure.

  syntax KItem ::= "undefined"  [latex(\bot)]

  rule <k> var X:Id; => . ...</k>
       <env> Env => Env[X <- L] </env>
       <store>... .Map => L |-> undefined ...</store>
       <nextLoc> L => L +Int 1 </nextLoc>

Array Declaration

The K semantics of the uni-dimensional array declaration is somehow similar
to the above declaration of ordinary variables. First, note the
context declaration below, which requests the evaluation of the array
dimension. Once evaluated, say to a natural number N, then
N +Int 1 locations are allocated in the store for
an array of size N, the additional location (chosen to be the first
one allocated) holding the array reference value. The array reference
value array(L,N) states that the array has size N and its
elements are located contiguously in the store starting with location
L. The operation L … L' ↦ V, defined at the end of this
file in the auxiliary operation section, initializes each location in
the list L … L' to V. Note that, since the dimensions of
array declarations can be arbitrary expressions, this virtually means
that we can dynamically allocate memory in SIMPLE by means of array
declarations.

  context var _:Id[HOLE];

  rule <k> var X:Id[N:Int]; => . ...</k>
       <env> Env => Env[X <- L] </env>
       <store>... .Map => L |-> array(L +Int 1, N)
                          (L +Int 1) ... (L +Int N) |-> undefined ...</store>
       <nextLoc> L => L +Int 1 +Int N </nextLoc>
    requires N >=Int 0

SIMPLE allows multi-dimensional arrays. For semantic simplicity, we
desugar them all into uni-dimensional arrays by code transformation.
This way, we only need to give semantics to uni-dimensional arrays.
First, note that the context rule above actually evaluates all the array
dimensions (that's why we defined the expression lists strict!):
Upon evaluating the array dimensions, the code generation rule below
desugars multi-dimensional array declaration to uni-dimensional declarations.
To this aim, we introduce two special unique variable identifiers,
$1 and $2. The first variable, $1, iterates
through and initializes each element of the first dimension with an array
of the remaining dimensions, declared as variable $2:

  syntax Id ::= "$1" | "$2"
  rule var X:Id[N1:Int, N2:Int, Vs:Vals];
    => var X[N1];
       {
         for(var $1 = 0; $1 <= N1 - 1; ++$1) {
           var $2[N2, Vs];
           X[$1] = $2;
         }
       }
    [structural]

Ideally, one would like to perform syntactic desugarings like the one
above before the actual semantics. Unfortunately, that was not possible in
this case because the dimension expressions of the multi-dimensional array need
to be evaluated first. Indeed, the desugaring rule above does not work if the
dimensions of the declared array are arbitrary expressions, because they can
have side effects (e.g., a[++x,++x]) and those side effects would be
propagated each time the expression is evaluated in the desugaring code (note
that both the loop condition and the nested multi-dimensional declaration
would need to evaluate the expressions given as array dimensions).

Function declaration

Functions are evaluated to λ-abstractions and stored like any other
values in the store. A binding is added into the environment for the function
name to the location holding its body. Similarly to the C language, SIMPLE
only allows function declarations at the top level of the program. More
precisely, the subsequent semantics of SIMPLE only works well when one
respects this requirement. Indeed, the simplistic context-free parser
generated by the grammar above is more generous than we may want, in that it
allows function declarations anywhere any declaration is allowed, including
inside arbitrary blocks. However, as the rule below shows, we are not
storing the declaration environment with the λ-abstraction value as
closures do. Instead, as seen shortly, we switch to the global environment
whenever functions are invoked, which is consistent with our requirement that
functions should only be declared at the top. Thus, if one declares local
functions, then one may see unexpected behaviors (e.g., when one shadows a
global variable before declaring a local function). The type checker of
SIMPLE, also defined in K (see examples/simple/typed/static),
discards programs which do not respect this requirement.

  rule <k> function F(Xs) S => . ...</k>
       <env> Env => Env[F <- L] </env>
       <store>... .Map => L |-> lambda(Xs, S) ...</store>
       <nextLoc> L => L +Int 1 </nextLoc>

When we are done with the first pass (pre-processing), the computation
cell k contains only the token execute (see the configuration
declaration above, where the computation item execute was placed
right after the program in the k cell of the initial configuration)
and the cell genv is empty. In this case, we have to call
main() and to initialize the global environment by transferring the
contents of the local environment into it. We prefer to do it this way, as
opposed to processing all the top level declarations directly within the global
environment, because we want to avoid duplication of semantics: the syntax of
the global declarations is identical to that of their corresponding local
declarations, so the semantics of the latter suffices provided that we copy
the local environment into the global one once we are done with the
pre-processing. We want this separate pre-processing step precisely because
we want to create the global environment. All (top-level) functions end up
having their names bound in the global environment and, as seen below, they
are executed in that same global environment; all these mean, in particular,
that the functions "see" each other, allowing for mutual recursion, etc.

  syntax KItem ::= "execute"
  rule <k> execute => main(.Exps); </k>
       <env> Env </env>
       <genv> .Map => Env </genv>  [structural]

Expressions

We next define the K semantics of all the expression constructs.

Variable lookup

When a variable X is the first computational task, and X is bound to some
location L in the environment, and L is mapped to some value V in the
store, then we rewrite X into V:

  rule <k> X:Id => V ...</k>
       <env>... X |-> L ...</env>
       <store>... L |-> V:Val ...</store>  [lookup]

Note that the rule above excludes reading , because is not
a value and V is checked at runtime to be a value.

Variable/Array increment

This is tricky, because we want to allow both ++x and ++a[5].
Therefore, we need to extract the lvalue of the expression to increment.
To do that, we state that the expression to increment should be wrapped
by the auxiliary lvalue operation and then evaluated. The semantics
of this auxiliary operation is defined at the end of this file. For now, all
we need to know is that it takes an expression and evaluates to a location
value. Location values, also defined at the end of the file, are integers
wrapped with the operation loc, to distinguish them from ordinary
integers.

  context ++(HOLE => lvalue(HOLE))
  rule <k> ++loc(L) => I +Int 1 ...</k>
       <store>... L |-> (I => I +Int 1) ...</store>  [increment]

Arithmetic operators

There is nothing special about the following rules. They rewrite the
language constructs to their library counterparts when their arguments
become values of expected sorts:

  rule I1 + I2 => I1 +Int I2
  rule Str1 + Str2 => Str1 +String Str2
  rule I1 - I2 => I1 -Int I2
  rule I1 * I2 => I1 *Int I2
  rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0
  rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0
  rule - I => 0 -Int I
  rule I1 < I2 => I1 <Int I2
  rule I1 <= I2 => I1 <=Int I2
  rule I1 > I2 => I1 >Int I2
  rule I1 >= I2 => I1 >=Int I2

The equality and inequality constructs reduce to syntactic comparison
of the two argument values (which is what the equality on K terms does).

  rule V1:Val == V2:Val => V1 ==K V2
  rule V1:Val != V2:Val => V1 =/=K V2

The logical negation is clear, but the logical conjunction and disjunction
are short-circuited:

  rule ! T => notBool(T)
  rule true  && E => E
  rule false && _ => false
  rule true  || _ => true
  rule false || E => E

Array lookup

Untyped SIMPLE does not check array bounds (the dynamically typed version of
it, in examples/simple/typed/dynamic, does check for array out of
bounds). The first rule below desugars the multi-dimensional array access to
uni-dimensional array access; recall that the array access operation was
declared strict, so all sub-expressions involved are already values at this
stage. The second rule rewrites the array access to a lookup operation at a
precise location; we prefer to do it this way to avoid locking the store.
The semantics of the auxiliary lookup operation is straightforward,
and is defined at the end of the file.

// The [anywhere] feature is underused, because it would only be used
// at the top of the computation or inside the lvalue wrapper. So it
// may not be worth, or we may need to come up with a special notation
// allowing us to enumerate contexts for [anywhere] rules.
  rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs]
    [structural, anywhere]

  rule array(L,_)[N:Int] => lookup(L +Int N)
    [structural, anywhere]

Size of an array

The size of the array is stored in the array reference value, and the
sizeOf construct was declared strict, so:

  rule sizeOf(array(_,N)) => N

Function call

Function application was strict in both its arguments, so we can
assume that both the function and its arguments are evaluated to
values (the former expected to be a λ-abstraction). The first
rule below matches a well-formed function application on top of the
computation and performs the following steps atomically: it switches
to the function body followed by return; (for the case in
which the function does not use an explicit return statement); it
pushes the remaining computation, the current environment, and the
current control data onto the function stack (the remaining
computation can thus also be discarded from the computation cell,
because an unavoidable subsequent return statement ‒see
above‒ will always recover it from the stack); it switches the
current environment (which is being pushed on the function stack) to
the global environment, which is where the free variables in the
function body should be looked up; it binds the formal parameters to
fresh locations in the new environment, and stores the actual
arguments to those locations in the store (this latter step is easily
done by reducing the problem to variable declarations, whose semantics
we have already defined; the auxiliary operation mkDecls is
defined at the end of the file). The second rule pops the
computation, the environment and the control data from the function
stack when a return statement is encountered as the next
computational task, passing the returned value to the popped
computation (the popped computation was the context in which the
returning function was called). Note that the pushing/popping of the
control data is crucial. Without it, one may have a function that
contains an exception block with a return statement inside, which
would put the xstack cell in an inconsistent state (since the
exception block modifies it, but that modification should be
irrelevant once the function returns). We add an artificial
nothing value to the language, which is returned by the
nulary return; statements.

  syntax KItem ::=  (Map,K,ControlCellFragment)

  rule <k> lambda(Xs,S)(Vs:Vals) ~> K => mkDecls(Xs,Vs) S return; </k>
       <control>
         <fstack> .List => ListItem((Env,K,C)) ...</fstack>
         C
       </control>
       <env> Env => GEnv </env>
       <genv> GEnv </genv>

  rule <k> return(V:Val); ~> _ => V ~> K </k>
       <control>
         <fstack> ListItem((Env,K,C)) => .List ...</fstack>
         (_ => C)
       </control>
       <env> _ => Env </env>

  syntax Val ::= "nothing"
  rule return; => return nothing;   [macro]

Like for division-by-zero, it is left unspecified what happens
when the nothing value is used in domain calculations. For
example, from the the perspective of the language semantics,
7 +Int nothing can evaluate to anything, or
may not evaluate at all (be undefined). If one wants to make sure that
such artificial values are never misused, then one needs to define a static
checker (also using K, like our the type checker in
examples/simple/typed/static) and reject programs that do.
Note that, unlike the undefined symbol which had the sort K
instead of Val, we defined nothing to be a value. That
is because, as explained above, we do not want the program to get
stuck when nothing is returned by a function. Instead, we want the
behavior to be unspecified; in particular, if one is careful to never
use the returned value in domain computation, like it happens when we
call a function for its side effects (e.g., with a statement of the
form f(x);), then the program does not get stuck.

Read

The read() expression construct simply evaluates to the next
input value, at the same time discarding the input value from the
in cell.

  rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input>  [read]

Assignment

In SIMPLE, like in C, assignments are expression constructs and not statement
constructs. To make it a statement all one needs to do is to follow it by a
semi-colon ; (see the semantics for expression statements below).
Like for the increment, we want to allow assignments not only to variables but
also to array elements, e.g., e1[e2] = e3 where e1 evaluates
to an array reference, e2 to a natural number, and e3 to any
value. Thus, we first compute the lvalue of the left-hand-side expression
that appears in an assignment, and then we do the actual assignment to the
resulting location:

  context (HOLE => lvalue(HOLE)) = _

  rule <k> loc(L) = V:Val => V ...</k> <store>... L |-> (_ => V) ...</store>
    [assignment]

Statements

We next define the K semantics of statements.

Blocks

Empty blocks are simply discarded, as shown in the first rule below.
For non-empty blocks, we schedule the enclosed statement but we have to
make sure the environment is recovered after the enclosed statement executes.
Recall that we allow local variable declarations, whose scope is the block
enclosing them. That is the reason for which we have to recover the
environment after the block. This allows us to have a very simple semantics
for variable declarations, as we did above. One can make the two rules below
computational if one wants them to count as computational steps.

  rule {} => .  [structural]
  rule <k> { S } => S ~> setEnv(Env) ...</k>  <env> Env </env>  [structural]

The basic definition of environment recovery is straightforward and
given in the section on auxiliary constructs at the end of the file.

There are two common alternatives to the above semantics of blocks.
One is to keep track of the variables which are declared in the block and only
recover those at the end of the block. This way one does more work for
variable declarations but conceptually less work for environment recovery; we
say conceptually because it is not clear that it is indeed the case that
one does less work when AC matching is involved. The other alternative is to
work with a stack of environments instead of a flat environment, and push the
current environment when entering a block and pop it when exiting it. This
way, one does more work when accessing variables (since one has to search the
variable in the environment stack in a top-down manner), but on the other hand
uses smaller environments and the definition gets closer to an implementation.
Based on experience with dozens of language semantics and other K definitions,
we have found that our approach above is the best trade-off between elegance
and efficiency (especially since rewrite engines have built-in techniques to
lazily copy terms, by need, thus not creating unnecessary copies),
so it is the one that we follow in general.

Sequential composition

Sequential composition is desugared into K's builtin sequentialization
operation (recall that, like in C, the semi-colon ; is not a
statement separator in SIMPLE — it is either a statement terminator or a
construct for a statement from an expression). The rule below is
structural, so it does not count as a computational step. One can make it
computational if one wants it to count as a step. Note that K allows
to define the semantics of SIMPLE in such a way that statements eventually
dissolve from the top of the computation when they are completed; this is in
sharp contrast to (artificially) evaluating them to a special
skip statement value and then getting rid of that special value, as
it is the case in other semantic approaches (where everything must evaluate
to something). This means that once S₁ completes in the rule below, S₂
becomes automatically the next computation item without any additional
(explicit or implicit) rules.

  rule S1:Stmt S2:Stmt => S1 ~> S2  [structural]

A subtle aspect of the rule above is that S₁ is declared to have sort
Stmts and not Stmt. That is because desugaring macros can indeed
produce left associative sequential composition of statements. For example,
the code var x=0; x=1; is desugared to
(var x; x=0;) x=1;, so although originally the first term of
the sequential composition had sort Stmt, after desugaring it became
of sort Stmts. Note that the attribute [right] associated
to the sequential compositon production is an attribute of the syntax, and not
of the semantics: e.g., it tells the parser to parse
var x; x=0; x=1; as var x; (x=0; x=1;), but it
does not tell the rewrite engine to rewrite (var x; x=0;) x=1; to
var x; (x=0; x=1;).

Expression statements

Expression statements are only used for their side effects, so their result
value is simply discarded. Common examples of expression statements are ones
of the form ++x;, x=e;, e1[e2]=e3;, etc.

  rule _:Val; => .

Conditional

Since the conditional was declared with the strict(1) attribute, we
can assume that its first argument will eventually be evaluated. The rules
below cover the only two possibilities in which the conditional is allowed to
proceed (otherwise the rewriting process gets stuck).

  rule if ( true) S else _ => S
  rule if (false) _ else S => S

While loop

The simplest way to give the semantics of the while loop is by unrolling.
Note, however, that its unrolling is only allowed when the while loop reaches
the top of the computation (to avoid non-termination of unrolling). We prefer
the rule below to be structural, because we don't want the unrolling of the
while loop to count as a computational step; this is unavoidable in
conventional semantics, but it is possible in K thanks to its distinction
between structural and computational rules. The simple while loop semantics
below works because our while loops in SIMPLE are indeed very basic. If we
allowed break/continue of loops then we would need a completely different
semantics, which would also involve the control cell.

  rule while (E) S => if (E) {S while(E)S}  [structural]

Print

The print statement was strict, so all its arguments are now
evaluated (recall that print is variadic). We append each of
its evaluated arguments to the output buffer, and discard the residual
print statement with an empty list of arguments.

  rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output>
    [print]
  rule print(.Vals); => .  [structural]

Exceptions

SIMPLE allows parametric exceptions, in that one can throw and catch a
particular value. The statement try S₁ catch(X) S₂
proceeds with the evaluation of S₁. If S₁ evaluates normally, i.e.,
without any exception thrown, then S₂ is discarded and the execution
continues normally. If S₁ throws an exception with a statement of the
form throw E, then E is first evaluated to some value V
(throw was declared to be strict), then V is bound to X, then
S₂ is evaluated in the new environment while the reminder of S₁ is
discarded, then the environment is recovered and the execution continues
normally with the statement following the try S₁ catch(X) S₂ statement.
Exceptions can be nested and the statements in the
catch part (S₂ in our case) can throw exceptions to the
upper level. One should be careful with how one handles the control data
structures here, so that the abrupt changes of control due to exception
throwing and to function returns interact correctly with each other.
For example, we want to allow function calls inside the statement S₁ in
a try S₁ catch(X) S₂ block which can throw an exception
that is not caught by the function but instead is propagated to the
try S₁ catch(X) S₂ block that called the function.
Therefore, we have to make sure that the function stack as well as other
potential control structures are also properly modified when the exception
is thrown to correctly recover the execution context. This can be easily
achieved by pushing/popping the entire current control context onto the
exception stack. The three rules below modularly do precisely the above.

  syntax KItem ::= (Id,Stmt,K,Map,ControlCellFragment)

  syntax KItem ::= "popx"

  rule <k> (try S1 catch(X) {S2} => S1 ~> popx) ~> K </k>
       <control>
         <xstack> .List => ListItem((X, S2, K, Env, C)) ...</xstack>
         C
       </control>
       <env> Env </env>

  rule <k> popx => . ...</k>
       <xstack> ListItem(_) => .List ...</xstack>

  rule <k> throw V:Val; ~> _ => { var X = V; S2 } ~> K </k>
       <control>
         <xstack> ListItem((X, S2, K, Env, C)) => .List ...</xstack>
         (_ => C)
       </control>
       <env> _ => Env </env>

The catch statement S₂ needs to be executed in the original environment,
but where the thrown value V is bound to the catch variable X. We here
chose to rely on two previously defined constructs when giving semantics to
the catch part of the statement: (1) the variable declaration with
initialization, for binding X to V; and (2) the block construct for
preventing X from shadowing variables in the original environment upon the
completion of S₂.

Threads

SIMPLE's threads can be created and terminated dynamically, and can
synchronize by acquiring and releasing re-entrant locks and by rendezvous.
We discuss the seven rules giving the semantics of these operations below.

Thread creation

Threads can be created by any other threads using the spawn S
construct. The spawn expression construct evaluates to the unique identifier
of the newly created thread and, at the same time, a new thread cell is added
into the configuration, initialized with the S statement and sharing the
same environment with the parent thread. Note that the newly created
thread cell is torn. That means that the remaining cells are added
and initialized automatically as described in the definition of SIMPLE's
configuration. This is part of K's configuration abstraction mechanism.

  rule <thread>...
         <k> spawn S => !T:Int ...</k>
         <env> Env </env>
       ...</thread>
       (.Bag => <thread>...
               <k> S </k>
               <env> Env </env>
               <id> !T </id>
             ...</thread>)

Thread termination

Dually to the above, when a thread terminates its assigned computation (the
contents of its k cell) is empty, so the thread can be dissolved.
However, since no discipline is imposed on how locks are acquired and released,
it can be the case that a terminating thread still holds locks. Those locks
must be released, so other threads attempting to acquire them do not deadlock.
We achieve that by removing all the locks held by the terminating thread in its
holds cell from the set of busy locks in the busy cell
(keys(H) returns the domain of the map H as a set, that is, only
the locks themselves ignoring their multiplicity). As seen below, a lock is
added to the busy cell as soon as it is acquired for the first time
by a thread. The unique identifier of the terminated thread is also collected
into the terminated cell, so the join construct knows which
threads have terminated.

  rule (<thread>... <k>.</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag)
       <busy> Busy => Busy -Set keys(H) </busy>
       <terminated>... .Set => SetItem(T) ...</terminated>

Thread joining

Thread joining is now straightforward: all we need to do is to check whether
the identifier of the thread to be joined is in the terminated cell.
If yes, then the join statement dissolves and the joining thread
continues normally; if not, then the joining thread gets stuck.

  rule <k> join T:Int; => . ...</k>
       <terminated>... SetItem(T) ...</terminated>

Acquire lock

There are two cases to distinguish when a thread attempts to acquire a lock
(in SIMPLE any value can be used as a lock):
(1) The thread does not currently have the lock, in which case it has to
take it provided that the lock is not already taken by another thread (see
the side condition of the first rule).
(2) The thread already has the lock, in which case it just increments its
counter for the lock (the locks are re-entrant). These two cases are captured
by the two rules below:

  rule <k> acquire V:Val; => . ...</k>
       <holds>... .Map => V |-> 0 ...</holds>
       <busy> Busy (.Set => SetItem(V)) </busy>
    requires (notBool(V in Busy))  [acquire]

  rule <k> acquire V; => . ...</k>
       <holds>... V:Val |-> (N => N +Int 1) ...</holds>

Release lock

Similarly, there are two corresponding cases to distinguish when a thread
releases a lock:
(1) The thread holds the lock more than once, in which case all it needs to do
is to decrement the lock counter.
(2) The thread holds the lock only once, in which case it needs to remove it
from its holds cell and also from the the shared busy cell,
so other threads can acquire it if they need to.

  rule <k> release V:Val; => . ...</k>
       <holds>... V |-> (N => N -Int 1) ...</holds>
    requires N >Int 0

  rule <k> release V; => . ...</k> <holds>... V:Val |-> 0 => .Map ...</holds>
       <busy>... SetItem(V) => .Set ...</busy>

Rendezvous synchronization

In addition to synchronization through acquire and release of locks, SIMPLE
also provides a construct for rendezvous synchronization. A thread whose next
statement to execute is rendezvous(V) gets stuck until another
thread reaches an identical statement; when that happens, the two threads
drop their rendezvous statements and continue their executions. If three
threads happen to have an identical rendezvous statement as their next
statement, then precisely two of them will synchronize and the other will
remain blocked until another thread reaches a similar rendezvous statement.
The rule below is as simple as it can be. Note, however, that, again, it is
K's mechanism for configuration abstraction that makes it work as desired:
since the only cell which can multiply containing a k cell inside is
the thread cell, the only way to concretize the rule below to the
actual configuration of SIMPLE is to include each k cell in a
thread cell.

  rule <k> rendezvous V:Val; => . ...</k>
       <k> rendezvous V; => . ...</k>  [rendezvous]

Auxiliary declarations and operations

In this section we define all the auxiliary constructs used in the
above semantics.

Making declarations

The mkDecls auxiliary construct turns a list of identifiers
and a list of values in a sequence of corresponding variable
declarations.

  syntax Stmt ::= mkDecls(Ids,Vals)  [function]
  rule mkDecls((X:Id, Xs:Ids), (V:Val, Vs:Vals)) => var X=V; mkDecls(Xs,Vs)
  rule mkDecls(.Ids,.Vals) => {}

Location lookup

The operation below is straightforward. Note that we tag it with the same
lookup tag as the variable lookup rule defined above. This way,
both rules will be considered transitions when we include the lookup
tag in the transition option of kompile.

  syntax Exp ::= lookup(Int)
  rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store>  [lookup]

Environment recovery

We have already discussed the environment recovery auxiliary operation in the
IMP++ tutorial:

// TODO: eliminate the env wrapper, like we did in IMP++

  syntax KItem ::= setEnv(Map)
  rule <k> setEnv(Env) => . ...</k> <env> _ => Env </env>  [structural]

While theoretically sufficient, the basic definition for environment
recovery alone is suboptimal. Consider a loop while (E)S,
whose semantics (see above) was given by unrolling. S
is a block. Then the semantics of blocks above, together with the
unrolling semantics of the while loop, will yield a computation
structure in the k cell that increasingly grows, adding a new
environment recovery task right in front of the already existing sequence of
similar environment recovery tasks (this phenomenon is similar to the ``tail
recursion'' problem). Of course, when we have a sequence of environment
recovery tasks, we only need to keep the last one. The elegant rule below
does precisely that, thus avoiding the unnecessary computation explosion
problem:

  rule (setEnv(_) => .) ~> setEnv(_)  [structural]

In fact, the above follows a common convention in K for recovery
operations of cell contents: the meaning of a computation task of the form
cell(C) that reaches the top of the computation is that the current
contents of cell cell is discarded and gets replaced with C. We
did not add support for these special computation tasks in our current
implementation of K, so we need to define them as above.

lvalue and loc

For convenience in giving the semantics of constructs like the increment and
the assignment, that we want to operate the same way on variables and on
array elements, we used an auxiliary lvalue(E) construct which was
expected to evaluate to the lvalue of the expression E. This is only
defined when E has an lvalue, that is, when E is either a variable or
evaluates to an array element. lvalue(E) evaluates to a value of
the form loc(L), where L is the location where the value of E
can be found; for clarity, we use loc to structurally distinguish
natural numbers from location values. In giving semantics to lvalue
there are two cases to consider. (1) If E is a variable, then all we need
to do is to grab its location from the environment. (2) If E is an array
element, then we first evaluate the array and its index in order to identify
the exact location of the element of concern, and then return that location;
the last rule below works because its preceding context declarations ensure
that the array and its index are evaluated, and then the rule for array lookup
(defined above) rewrites the evaluated array access construct to its
corresponding store lookup operation.

// For parsing reasons, we prefer to allow lvalue to take a K

  syntax Exp ::= lvalue(K)
  syntax Val ::= loc(Int)

// Local variable

  rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env>
    [structural]

// Array element: evaluate the array and its index;
// then the array lookup rule above applies.

  context lvalue(_::Exp[HOLE::Exps])
  context lvalue(HOLE::Exp[_::Exps])

// Finally, return the address of the desired object member

  rule lvalue(lookup(L:Int) => loc(L))  [structural]

Initializing multiple locations

The following operation initializes a sequence of locations with the same
value:

  syntax Map ::= Int "..." Int "|->" K
    [function, latex({#1}\ldots{#2}\mapsto{#3})]
  rule N...M |-> _ => .Map  requires N >Int M
  rule N...M |-> K => N |-> K (N +Int 1)...M |-> K  requires N <=Int M

The semantics of SIMPLE is now complete. Make sure you kompile the
definition with the right options in order to generate the desired model.
No kompile options are needed if you only only want to execute the definition
(and thus get an interpreter), but if you want to search for a different
program behaviors then you need to kompile with the transition option
including rule tags such as lookup, increment, acquire, etc. See the
IMP++ tutorial for what the transition option means how to use it.

endmodule

Go to Lesson 2, SIMPLE typed static

SIMPLE — Untyped

Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign

Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest

Abstract

This is the K semantic definition of the untyped SIMPLE language.
SIMPLE is intended to be a pedagogical and research language that captures
the essence of the imperative programming paradigm, extended with several
features often encountered in imperative programming languages.
A program consists of a set of global variable declarations and
function definitions. Like in C, function definitions cannot be
nested and each program must have one function called main,
which is invoked when the program is executed. To make it more
interesting and to highlight some of K's strengths, SIMPLE includes
the following features in addition to the conventional imperative
expression and statement constructs:

  • Multidimensional arrays and array references. An array evaluates
    to an array reference, which is a special value holding a location (where
    the elements of the array start) together with the size of the array;
    the elements of the array can be array references themselves (particularly
    when the array is multi-dimensional). Array references are ordinary values,
    so they can be assigned to variables and passed/received by functions.

  • Functions and function values. Functions can have zero or
    more parameters and can return abruptly using a return statement.
    SIMPLE follows a call-by-value parameter passing style, with static scoping.
    Function names evaluate to function abstractions, which hereby become ordinary
    values in the language, same like the array references.

  • Blocks with locals. SIMPLE variables can be declared
    anywhere, their scope being from the place where they are declared
    until the end of the most nested enclosing block.

  • Input/Output. The expression read() evaluates to the
    next value in the input buffer, and the statement write(e)
    evaluates e and outputs its value to the output buffer. The
    input and output buffers are lists of values.

  • Exceptions. SIMPLE has parametric exceptions (the value thrown as
    an exception can be caught and bound).

  • Concurrency via dynamic thread creation/termination and
    synchronization. One can spawn a thread to execute any statement.
    The spawned thread shares with its parent its environment at creation time.
    Threads can be synchronized via a join command which blocks the current thread
    until the joined thread completes, via re-entrant locks which can be acquired
    and released, as well as through rendezvous commands.

Like in many other languages, some of SIMPLE's constructs can be
desugared into a smaller set of basic constructs. We do that at the end
of the syntax module, and then we only give semantics to the core constructs.

Note: This definition is commented slightly more than others, because it is
intended to be one of the first non-trivial definitions that the new
user of K sees. We recommend the beginner user to first check the
language definitions discussed in the K tutorial.

module SIMPLE-UNTYPED-SYNTAX
  imports DOMAINS-SYNTAX

Syntax

We start by defining the SIMPLE syntax. The language constructs discussed
above have the expected syntax and evaluation strategies. Recall that in K
we annotate the syntax with appropriate strictness attributes, thus giving
each language construct the desired evaluation strategy.

Identifiers

Recall from the K tutorial that identifiers are builtin and come under the
syntactic category Id. The special identifier for the function
main belongs to all programs, and plays a special role in the semantics,
so we declare it explicitly. This would not be necessary if the identifiers
were all included automatically in semantic definitions, but that is not
possible because of parsing reasons (e.g., K variables used to match
concrete identifiers would then be ambiguously parsed as identifiers). They
are only included in the parser generated to parse programs (and used by the
kast tool). Consequently, we have to explicitly declare all the
concrete identifiers that play a special role in the semantics, like
main below.

  syntax Id ::= "main" [token]

Declarations

There are two types of declarations: for variables (including arrays) and
for functions. We are going to allow declarations of the form
var x=10, a[10,10], y=23;, which is why we allow the var
keyword to take a list of expressions. The non-terminals used in the two
productions below are defined shortly.

  syntax Stmt ::= "var" Exps ";"
                | "function" Id "(" Ids ")" Block

Expressions

The expression constructs below are standard. Increment (++) takes
an expression rather than a variable because it can also increment an array
element. Recall that the syntax we define in K is what we call the syntax
of the semantics
: while powerful enough to define non-trivial syntaxes
(thanks to the underlying SDF technology that we use), we typically refrain
from defining precise syntaxes, that is, ones which accept precisely the
well-formed programs (that would not be possible anyway in general). That job
is deferred to type systems, which can also be defined in K. In other words,
we are not making any effort to guarantee syntactically that only variables
or array elements are passed to the increment construct, we allow any
expression. Nevertheless, we will only give semantics to those, so expressions
of the form ++5, which parse (but which will be rejected by our type
system in the typed version of SIMPLE later), will get stuck when executed.
Arrays can be multidimensional and can hold other arrays, so their
lookup operation takes a list of expressions as argument and applies to an
expression (which can in particular be another array lookup), respectively.
The construct sizeOf gives the size of an array in number of elements
of its first dimension. Note that almost all constructs are strict. The only
constructs which are not strict are the increment (since its first argument
gets updated, so it cannot be evaluated), the input read which takes no
arguments so strictness is irrelevant for it, the logical and and or constructs
which are short-circuited, the thread spawning construct which creates a new
thread executing the argument expression and return its unique identifier to
the creating thread (so it cannot just evaluate its argument in place), and the
assignment which is only strict in its second argument (for the same reason as
the increment).

  syntax Exp ::= Int | Bool | String | Id
               | "(" Exp ")"             [bracket]
               | "++" Exp
               > Exp "[" Exps "]"        [strict]
               > Exp "(" Exps ")"        [strict]
               | "-" Exp                 [strict]
               | "sizeOf" "(" Exp ")"    [strict]
               | "read" "(" ")"
               > left:
                 Exp "*" Exp             [strict, left]
               | Exp "/" Exp             [strict, left]
               | Exp "%" Exp             [strict, left]
               > left:
                 Exp "+" Exp             [strict, left]
               | Exp "-" Exp             [strict, left]
               > non-assoc:
                 Exp "<" Exp             [strict, non-assoc]
               | Exp "<=" Exp            [strict, non-assoc]
               | Exp ">" Exp             [strict, non-assoc]
               | Exp ">=" Exp            [strict, non-assoc]
               | Exp "==" Exp            [strict, non-assoc]
               | Exp "!=" Exp            [strict, non-assoc]
               > "!" Exp                 [strict]
               > left:
                 Exp "&&" Exp            [strict(1), left]
               | Exp "||" Exp            [strict(1), left]
               > "spawn" Block
               > Exp "=" Exp             [strict(2), right]

We also need comma-separated lists of identifiers and of expressions.
Moreover, we want them to be strict, that is, to evaluate to lists of results
whenever requested (e.g., when they appear as strict arguments of
the constructs above).

  syntax Ids  ::= List{Id,","}
  syntax Exps ::= List{Exp,","}          [strict]  // automatically hybrid now
  syntax Exps ::= Ids
  syntax Val
  syntax Vals ::= List{Val,","}
  syntax Bottom
  syntax Bottoms ::= List{Bottom,","}
  syntax Ids ::= Bottoms

Statements

Most of the statement constructs are standard for imperative languages.
We syntactically distinguish between empty and non-empty blocks, because we
chose Stmts not to be a (;-separated) list of
Stmt. Variables can be declared anywhere inside a block, their scope
ending with the block. Expressions are allowed to be used for their side
effects only (followed by a semicolon ;). Functions are allowed
to abruptly return. The exceptions are parametric, i.e., one can throw a value
which is bound to the variable declared by catch. Threads can be
dynamically created and terminated, and can synchronize with join,
acquire, release and rendezvous. Note that the
strictness attributes obey the intended evaluation strategy of the various
constructs. In particular, the if-then-else construct is strict only in its
first argument (the if-then construct will be desugared into if-then-else),
while the loop constructs are not strict in any arguments. The print
statement construct is variadic, that is, it takes an arbitrary number of
arguments.

  syntax Block ::= "{" "}"
                | "{" Stmt "}"

  syntax Stmt ::= Block
                | Exp ";"                               [strict]
                | "if" "(" Exp ")" Block "else" Block   [avoid, strict(1)]
                | "if" "(" Exp ")" Block
                | "while" "(" Exp ")" Block
                | "for" "(" Stmt Exp ";" Exp ")" Block
                | "return" Exp ";"                      [strict]
                | "return" ";"
                | "print" "(" Exps ")" ";"              [strict]
// NOTE: print strict allows non-deterministic evaluation of its arguments
// Either keep like this but document, or otherwise make Exps seqstrict.
// Of define and use a different expression list here, which is seqstrict.
                | "try" Block "catch" "(" Id ")" Block
                | "throw" Exp ";"                       [strict]
                | "join" Exp ";"                        [strict]
                | "acquire" Exp ";"                     [strict]
                | "release" Exp ";"                     [strict]
                | "rendezvous" Exp ";"                  [strict]

The reason we allow Stmts as the first argument of for
instead of Stmt is because we want to allow more than one statement
to be executed when the loop is initialized. Also, as seens shorly, macros
may expand one statement into more statements; for example, an initialized
variable declaration statement var x=0; desugars into two statements,
namely var x; x=0;, so if we use Stmt instead of Stmts
in the production of for above then we risk that the macro expansion
of statement var x=0; happens before the macro expansion of for,
also shown below, in which case the latter would not apply anymore because
of syntactic mismatch.

  syntax Stmt ::= Stmt Stmt                          [right]

// I wish I were able to write the following instead, but confuses the parser.
//
// syntax Stmts ::= List{Stmt,""}
// syntax Top ::= Stmt | "function" Id "(" Ids ")" Block
// syntax Pgm ::= List{Top,""}
//
// With that, I could have also eliminated the empty block

Desugared Syntax

This part desugars some of SIMPLE's language constructs into core ones.
We only want to give semantics to core constructs, so we get rid of the
derived ones before we start the semantics. All desugaring macros below are
straightforward.

  rule if (E) S => if (E) S else {}                                 [macro]
  rule for(Start Cond; Step) {S} => {Start while (Cond) {S Step;}}  [macro]
  rule for(Start Cond; Step) {} => {Start while (Cond) {Step;}}     [macro]
  rule var E1:Exp, E2:Exp, Es:Exps; => var E1; var E2, Es;          [macro-rec]
  rule var X:Id = E; => var X; X = E;                               [macro]

For the semantics, we can therefore assume from now on that each
conditional has both branches, that there are only while loops, and
that each variable is declared alone and without any initialization as part of
the declaration.

endmodule


module SIMPLE-UNTYPED
  imports SIMPLE-UNTYPED-SYNTAX
  imports DOMAINS

Basic Semantic Infrastructure

Before one starts adding semantic rules to a K definition, one needs to
define the basic semantic infrastructure consisting of definitions for
values and configuration. As discussed in the definitions
in the K tutorial, the values are needed to know when to stop applying
the heating rules and when to start applying the cooling rules corresponding
to strictness or context declarations. The configuration serves as a backbone
for the process of configuration abstraction which allows users to only
mention the relevant cells in each semantic rule, the rest of the configuration
context being inferred automatically. Although in some cases the configuration
could be automatically inferred from the rules, we believe that it is very
useful for language designers/semanticists to actually think of and design
their configuration explicitly, so the current implementation of K requires
one to define it.

Values

We here define the values of the language that the various fragments of
programs evaluate to. First, integers and Booleans are values. As discussed,
arrays evaluate to special array reference values holding (1) a location from
where the array's elements are contiguously allocated in the store, and
(2) the size of the array. Functions evaluate to function values as
λ-abstractions (we do not need to evaluate functions to closures
because each function is executed in the fixed global environment and
function definitions cannot be nested). Like in IMP and other
languages, we finally tell the tool that values are K results.

  syntax Val ::= Int | Bool | String
               | array(Int,Int)
               | lambda(Ids,Stmt)
  syntax Exp ::= Val
  syntax Exps ::= Vals
  syntax Vals ::= Bottoms
  syntax KResult ::= Val
                   | Vals  // TODO: should not need this

The inclusion of values in expressions follows the methodology of
syntactic definitions (like, e.g., in SOS): extend the syntax of the language
to encompass all values and additional constructs needed to give semantics.
In addition to that, it allows us to write the semantic rules using the
original syntax of the language, and to parse them with the same (now extended
with additional values) parser. If writing the semantics directly on the K
AST, using the associated labels instead of the syntactic constructs, then one
would not need to include values in expressions.

Configuration

The K configuration of SIMPLE consists of a top level cell, T,
holding a threads cell, a global environment map cell genv
mapping the global variables and function names to their locations, a shared
store map cell store mapping each location to some value, a set cell
busy holding the locks which have been acquired but not yet released
by threads, a set cell terminated holding the unique identifiers of
the threads which already terminated (needed for join), input
and output list cells, and a nextLoc cell holding a natural
number indicating the next available location. Unlike in the small languages
in the K tutorial, where we used the fresh predicate to generate fresh
locations, in larger languages, like SIMPLE, we prefer to explicitly manage
memory. The location counter in nextLoc models an actual physical
location in the store; for simplicity, we assume arbitrarily large memory and
no garbage collection. The threads cell contains one thread
cell for each existing thread in the program. Note that the thread cell has
multiplicity *, which means that at any given moment there could be zero,
one or more thread cells. Each thread cell contains a
computation cell k, a control cell holding the various
control structures needed to jump to certain points of interest in the program
execution, a local environment map cell env mapping the thread local
variables to locations in the store, and finally a holds map cell
indicating what locks have been acquired by the thread and not released so far
and how many times (SIMPLE's locks are re-entrant). The control cell
currently contains only two subcells, a function stack fstack which
is a list and an exception stack xstack which is also a list.
One can add more control structures in the control cell, such as a
stack for break/continue of loops, etc., if the language is extended with more
control-changing constructs. Note that all cells except for k are
also initialized, in that they contain a ground term of their corresponding
sort. The k cell is initialized with the program that will be passed
to the K tool, as indicated by the $PGM variable, followed by the
execute task (defined shortly).

  // the syntax declarations below are required because the sorts are
  // referenced directly by a production and, because of the way KIL to KORE
  // is implemented, the configuration syntax is not available yet
  // should simply work once KIL is removed completely
  // check other definitions for this hack as well

  syntax ControlCell
  syntax ControlCellFragment

  configuration <T color="red">
                  <threads color="orange">
                    <thread multiplicity="*" color="yellow">
                      <k color="green"> $PGM:Stmt ~> execute </k>
                    //<br/> // TODO(KORE): support latex annotations #1799
                      <control color="cyan">
                        <fstack color="blue"> .List </fstack>
                        <xstack color="purple"> .List </xstack>
                      </control>
                    //<br/> // TODO(KORE): support latex annotations #1799
                      <env color="violet"> .Map </env>
                      <holds color="black"> .Map </holds>
                      <id color="pink"> 0 </id>
                    </thread>
                  </threads>
                //<br/> // TODO(KORE): support latex annotations #1799
                  <genv color="pink"> .Map </genv>
                  <store color="white"> .Map </store>
                  <busy color="cyan"> .Set </busy>
                  <terminated color="red"> .Set </terminated>
                //<br/> // TODO(KORE): support latex annotations #1799
                  <input color="magenta" stream="stdin"> .List </input>
                  <output color="brown" stream="stdout"> .List </output>
                  <nextLoc color="gray"> 0 </nextLoc>
                </T>

Declarations and Initialization

We start by defining the semantics of declarations (for variables,
arrays and functions).

Variable Declaration

The SIMPLE syntax was desugared above so that each variable is
declared alone and its initialization is done as a separate statement.
The semantic rule below matches resulting variable declarations of the
form var X; on top of the k cell
(indeed, note that the k cell is complete, or round, to the
left, and is torn, or ruptured, to the right), allocates a fresh
location L in the store which is initialized with a special value
(indeed, the unit ., or nothing, is matched anywhere
in the map ‒note the tears at both sides‒ and replaced with the
mapping L ↦ ⊥), and binds X to L in the local
environment shadowing previous declarations of X, if any.
This possible shadowing of X requires us to therefore update the
entire environment map, which is expensive and can significantly slow
down the execution of larger programs. On the other hand, since we know
that L is not already bound in the store, we simply add the binding
L ↦ ⊥ to the store, thus avoiding a potentially complete
traversal of the the store map in order to update it. We prefer the approach
used for updating the store whenever possible, because, in addition to being
faster, it offers more true concurrency than the latter; indeed, according
to the concurrent semantics of K, the store is not frozen while
L ↦ ⊥ is added to it, while the environment is frozen during the
update operation Env[L/X]. The variable declaration command is
also removed from the top of the computation cell and the fresh location
counter is incremented. The undefined symbol added in the store
is of sort KItem, instead of Val, on purpose; this way, the
store lookup rules will get stuck when one attempts to lookup an
uninitialized location. All the above happen in one transactional step,
with the rule below. Note also how configuration abstraction allows us to
only mention the needed cells; indeed, as the configuration above states,
the k and env cells are actually located within a
thread cell within the threads cell, but one needs
not mention these: the configuration context of the rule is
automatically transformed to match the declared configuration
structure.

  syntax KItem ::= "undefined"  [latex(\bot)]

  rule <k> var X:Id; => . ...</k>
       <env> Env => Env[X <- L] </env>
       <store>... .Map => L |-> undefined ...</store>
       <nextLoc> L => L +Int 1 </nextLoc>

Array Declaration

The K semantics of the uni-dimensional array declaration is somehow similar
to the above declaration of ordinary variables. First, note the
context declaration below, which requests the evaluation of the array
dimension. Once evaluated, say to a natural number N, then
N +Int 1 locations are allocated in the store for
an array of size N, the additional location (chosen to be the first
one allocated) holding the array reference value. The array reference
value array(L,N) states that the array has size N and its
elements are located contiguously in the store starting with location
L. The operation L … L' ↦ V, defined at the end of this
file in the auxiliary operation section, initializes each location in
the list L … L' to V. Note that, since the dimensions of
array declarations can be arbitrary expressions, this virtually means
that we can dynamically allocate memory in SIMPLE by means of array
declarations.

  context var _:Id[HOLE];

  rule <k> var X:Id[N:Int]; => . ...</k>
       <env> Env => Env[X <- L] </env>
       <store>... .Map => L |-> array(L +Int 1, N)
                          (L +Int 1) ... (L +Int N) |-> undefined ...</store>
       <nextLoc> L => L +Int 1 +Int N </nextLoc>
    requires N >=Int 0

SIMPLE allows multi-dimensional arrays. For semantic simplicity, we
desugar them all into uni-dimensional arrays by code transformation.
This way, we only need to give semantics to uni-dimensional arrays.
First, note that the context rule above actually evaluates all the array
dimensions (that's why we defined the expression lists strict!):
Upon evaluating the array dimensions, the code generation rule below
desugars multi-dimensional array declaration to uni-dimensional declarations.
To this aim, we introduce two special unique variable identifiers,
$1 and $2. The first variable, $1, iterates
through and initializes each element of the first dimension with an array
of the remaining dimensions, declared as variable $2:

  syntax Id ::= "$1" | "$2"
  rule var X:Id[N1:Int, N2:Int, Vs:Vals];
    => var X[N1];
       {
         for(var $1 = 0; $1 <= N1 - 1; ++$1) {
           var $2[N2, Vs];
           X[$1] = $2;
         }
       }
    [structural]

Ideally, one would like to perform syntactic desugarings like the one
above before the actual semantics. Unfortunately, that was not possible in
this case because the dimension expressions of the multi-dimensional array need
to be evaluated first. Indeed, the desugaring rule above does not work if the
dimensions of the declared array are arbitrary expressions, because they can
have side effects (e.g., a[++x,++x]) and those side effects would be
propagated each time the expression is evaluated in the desugaring code (note
that both the loop condition and the nested multi-dimensional declaration
would need to evaluate the expressions given as array dimensions).

Function declaration

Functions are evaluated to λ-abstractions and stored like any other
values in the store. A binding is added into the environment for the function
name to the location holding its body. Similarly to the C language, SIMPLE
only allows function declarations at the top level of the program. More
precisely, the subsequent semantics of SIMPLE only works well when one
respects this requirement. Indeed, the simplistic context-free parser
generated by the grammar above is more generous than we may want, in that it
allows function declarations anywhere any declaration is allowed, including
inside arbitrary blocks. However, as the rule below shows, we are not
storing the declaration environment with the λ-abstraction value as
closures do. Instead, as seen shortly, we switch to the global environment
whenever functions are invoked, which is consistent with our requirement that
functions should only be declared at the top. Thus, if one declares local
functions, then one may see unexpected behaviors (e.g., when one shadows a
global variable before declaring a local function). The type checker of
SIMPLE, also defined in K (see examples/simple/typed/static),
discards programs which do not respect this requirement.

  rule <k> function F(Xs) S => . ...</k>
       <env> Env => Env[F <- L] </env>
       <store>... .Map => L |-> lambda(Xs, S) ...</store>
       <nextLoc> L => L +Int 1 </nextLoc>

When we are done with the first pass (pre-processing), the computation
cell k contains only the token execute (see the configuration
declaration above, where the computation item execute was placed
right after the program in the k cell of the initial configuration)
and the cell genv is empty. In this case, we have to call
main() and to initialize the global environment by transferring the
contents of the local environment into it. We prefer to do it this way, as
opposed to processing all the top level declarations directly within the global
environment, because we want to avoid duplication of semantics: the syntax of
the global declarations is identical to that of their corresponding local
declarations, so the semantics of the latter suffices provided that we copy
the local environment into the global one once we are done with the
pre-processing. We want this separate pre-processing step precisely because
we want to create the global environment. All (top-level) functions end up
having their names bound in the global environment and, as seen below, they
are executed in that same global environment; all these mean, in particular,
that the functions "see" each other, allowing for mutual recursion, etc.

  syntax KItem ::= "execute"
  rule <k> execute => main(.Exps); </k>
       <env> Env </env>
       <genv> .Map => Env </genv>  [structural]

Expressions

We next define the K semantics of all the expression constructs.

Variable lookup

When a variable X is the first computational task, and X is bound to some
location L in the environment, and L is mapped to some value V in the
store, then we rewrite X into V:

  rule <k> X:Id => V ...</k>
       <env>... X |-> L ...</env>
       <store>... L |-> V:Val ...</store>  [lookup]

Note that the rule above excludes reading , because is not
a value and V is checked at runtime to be a value.

Variable/Array increment

This is tricky, because we want to allow both ++x and ++a[5].
Therefore, we need to extract the lvalue of the expression to increment.
To do that, we state that the expression to increment should be wrapped
by the auxiliary lvalue operation and then evaluated. The semantics
of this auxiliary operation is defined at the end of this file. For now, all
we need to know is that it takes an expression and evaluates to a location
value. Location values, also defined at the end of the file, are integers
wrapped with the operation loc, to distinguish them from ordinary
integers.

  context ++(HOLE => lvalue(HOLE))
  rule <k> ++loc(L) => I +Int 1 ...</k>
       <store>... L |-> (I => I +Int 1) ...</store>  [increment]

Arithmetic operators

There is nothing special about the following rules. They rewrite the
language constructs to their library counterparts when their arguments
become values of expected sorts:

  rule I1 + I2 => I1 +Int I2
  rule Str1 + Str2 => Str1 +String Str2
  rule I1 - I2 => I1 -Int I2
  rule I1 * I2 => I1 *Int I2
  rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0
  rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0
  rule - I => 0 -Int I
  rule I1 < I2 => I1 <Int I2
  rule I1 <= I2 => I1 <=Int I2
  rule I1 > I2 => I1 >Int I2
  rule I1 >= I2 => I1 >=Int I2

The equality and inequality constructs reduce to syntactic comparison
of the two argument values (which is what the equality on K terms does).

  rule V1:Val == V2:Val => V1 ==K V2
  rule V1:Val != V2:Val => V1 =/=K V2

The logical negation is clear, but the logical conjunction and disjunction
are short-circuited:

  rule ! T => notBool(T)
  rule true  && E => E
  rule false && _ => false
  rule true  || _ => true
  rule false || E => E

Array lookup

Untyped SIMPLE does not check array bounds (the dynamically typed version of
it, in examples/simple/typed/dynamic, does check for array out of
bounds). The first rule below desugars the multi-dimensional array access to
uni-dimensional array access; recall that the array access operation was
declared strict, so all sub-expressions involved are already values at this
stage. The second rule rewrites the array access to a lookup operation at a
precise location; we prefer to do it this way to avoid locking the store.
The semantics of the auxiliary lookup operation is straightforward,
and is defined at the end of the file.

// The [anywhere] feature is underused, because it would only be used
// at the top of the computation or inside the lvalue wrapper. So it
// may not be worth, or we may need to come up with a special notation
// allowing us to enumerate contexts for [anywhere] rules.
  rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs]
    [structural, anywhere]

  rule array(L,_)[N:Int] => lookup(L +Int N)
    [structural, anywhere]

Size of an array

The size of the array is stored in the array reference value, and the
sizeOf construct was declared strict, so:

  rule sizeOf(array(_,N)) => N

Function call

Function application was strict in both its arguments, so we can
assume that both the function and its arguments are evaluated to
values (the former expected to be a λ-abstraction). The first
rule below matches a well-formed function application on top of the
computation and performs the following steps atomically: it switches
to the function body followed by return; (for the case in
which the function does not use an explicit return statement); it
pushes the remaining computation, the current environment, and the
current control data onto the function stack (the remaining
computation can thus also be discarded from the computation cell,
because an unavoidable subsequent return statement ‒see
above‒ will always recover it from the stack); it switches the
current environment (which is being pushed on the function stack) to
the global environment, which is where the free variables in the
function body should be looked up; it binds the formal parameters to
fresh locations in the new environment, and stores the actual
arguments to those locations in the store (this latter step is easily
done by reducing the problem to variable declarations, whose semantics
we have already defined; the auxiliary operation mkDecls is
defined at the end of the file). The second rule pops the
computation, the environment and the control data from the function
stack when a return statement is encountered as the next
computational task, passing the returned value to the popped
computation (the popped computation was the context in which the
returning function was called). Note that the pushing/popping of the
control data is crucial. Without it, one may have a function that
contains an exception block with a return statement inside, which
would put the xstack cell in an inconsistent state (since the
exception block modifies it, but that modification should be
irrelevant once the function returns). We add an artificial
nothing value to the language, which is returned by the
nulary return; statements.

  syntax KItem ::=  (Map,K,ControlCellFragment)

  rule <k> lambda(Xs,S)(Vs:Vals) ~> K => mkDecls(Xs,Vs) S return; </k>
       <control>
         <fstack> .List => ListItem((Env,K,C)) ...</fstack>
         C
       </control>
       <env> Env => GEnv </env>
       <genv> GEnv </genv>

  rule <k> return(V:Val); ~> _ => V ~> K </k>
       <control>
         <fstack> ListItem((Env,K,C)) => .List ...</fstack>
         (_ => C)
       </control>
       <env> _ => Env </env>

  syntax Val ::= "nothing"
  rule return; => return nothing;   [macro]

Like for division-by-zero, it is left unspecified what happens
when the nothing value is used in domain calculations. For
example, from the the perspective of the language semantics,
7 +Int nothing can evaluate to anything, or
may not evaluate at all (be undefined). If one wants to make sure that
such artificial values are never misused, then one needs to define a static
checker (also using K, like our the type checker in
examples/simple/typed/static) and reject programs that do.
Note that, unlike the undefined symbol which had the sort K
instead of Val, we defined nothing to be a value. That
is because, as explained above, we do not want the program to get
stuck when nothing is returned by a function. Instead, we want the
behavior to be unspecified; in particular, if one is careful to never
use the returned value in domain computation, like it happens when we
call a function for its side effects (e.g., with a statement of the
form f(x);), then the program does not get stuck.

Read

The read() expression construct simply evaluates to the next
input value, at the same time discarding the input value from the
in cell.

  rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input>  [read]

Assignment

In SIMPLE, like in C, assignments are expression constructs and not statement
constructs. To make it a statement all one needs to do is to follow it by a
semi-colon ; (see the semantics for expression statements below).
Like for the increment, we want to allow assignments not only to variables but
also to array elements, e.g., e1[e2] = e3 where e1 evaluates
to an array reference, e2 to a natural number, and e3 to any
value. Thus, we first compute the lvalue of the left-hand-side expression
that appears in an assignment, and then we do the actual assignment to the
resulting location:

  context (HOLE => lvalue(HOLE)) = _

  rule <k> loc(L) = V:Val => V ...</k> <store>... L |-> (_ => V) ...</store>
    [assignment]

Statements

We next define the K semantics of statements.

Blocks

Empty blocks are simply discarded, as shown in the first rule below.
For non-empty blocks, we schedule the enclosed statement but we have to
make sure the environment is recovered after the enclosed statement executes.
Recall that we allow local variable declarations, whose scope is the block
enclosing them. That is the reason for which we have to recover the
environment after the block. This allows us to have a very simple semantics
for variable declarations, as we did above. One can make the two rules below
computational if one wants them to count as computational steps.

  rule {} => .  [structural]
  rule <k> { S } => S ~> setEnv(Env) ...</k>  <env> Env </env>  [structural]

The basic definition of environment recovery is straightforward and
given in the section on auxiliary constructs at the end of the file.

There are two common alternatives to the above semantics of blocks.
One is to keep track of the variables which are declared in the block and only
recover those at the end of the block. This way one does more work for
variable declarations but conceptually less work for environment recovery; we
say conceptually because it is not clear that it is indeed the case that
one does less work when AC matching is involved. The other alternative is to
work with a stack of environments instead of a flat environment, and push the
current environment when entering a block and pop it when exiting it. This
way, one does more work when accessing variables (since one has to search the
variable in the environment stack in a top-down manner), but on the other hand
uses smaller environments and the definition gets closer to an implementation.
Based on experience with dozens of language semantics and other K definitions,
we have found that our approach above is the best trade-off between elegance
and efficiency (especially since rewrite engines have built-in techniques to
lazily copy terms, by need, thus not creating unnecessary copies),
so it is the one that we follow in general.

Sequential composition

Sequential composition is desugared into K's builtin sequentialization
operation (recall that, like in C, the semi-colon ; is not a
statement separator in SIMPLE — it is either a statement terminator or a
construct for a statement from an expression). The rule below is
structural, so it does not count as a computational step. One can make it
computational if one wants it to count as a step. Note that K allows
to define the semantics of SIMPLE in such a way that statements eventually
dissolve from the top of the computation when they are completed; this is in
sharp contrast to (artificially) evaluating them to a special
skip statement value and then getting rid of that special value, as
it is the case in other semantic approaches (where everything must evaluate
to something). This means that once S₁ completes in the rule below, S₂
becomes automatically the next computation item without any additional
(explicit or implicit) rules.

  rule S1:Stmt S2:Stmt => S1 ~> S2  [structural]

A subtle aspect of the rule above is that S₁ is declared to have sort
Stmts and not Stmt. That is because desugaring macros can indeed
produce left associative sequential composition of statements. For example,
the code var x=0; x=1; is desugared to
(var x; x=0;) x=1;, so although originally the first term of
the sequential composition had sort Stmt, after desugaring it became
of sort Stmts. Note that the attribute [right] associated
to the sequential compositon production is an attribute of the syntax, and not
of the semantics: e.g., it tells the parser to parse
var x; x=0; x=1; as var x; (x=0; x=1;), but it
does not tell the rewrite engine to rewrite (var x; x=0;) x=1; to
var x; (x=0; x=1;).

Expression statements

Expression statements are only used for their side effects, so their result
value is simply discarded. Common examples of expression statements are ones
of the form ++x;, x=e;, e1[e2]=e3;, etc.

  rule _:Val; => .

Conditional

Since the conditional was declared with the strict(1) attribute, we
can assume that its first argument will eventually be evaluated. The rules
below cover the only two possibilities in which the conditional is allowed to
proceed (otherwise the rewriting process gets stuck).

  rule if ( true) S else _ => S
  rule if (false) _ else S => S

While loop

The simplest way to give the semantics of the while loop is by unrolling.
Note, however, that its unrolling is only allowed when the while loop reaches
the top of the computation (to avoid non-termination of unrolling). We prefer
the rule below to be structural, because we don't want the unrolling of the
while loop to count as a computational step; this is unavoidable in
conventional semantics, but it is possible in K thanks to its distinction
between structural and computational rules. The simple while loop semantics
below works because our while loops in SIMPLE are indeed very basic. If we
allowed break/continue of loops then we would need a completely different
semantics, which would also involve the control cell.

  rule while (E) S => if (E) {S while(E)S}  [structural]

Print

The print statement was strict, so all its arguments are now
evaluated (recall that print is variadic). We append each of
its evaluated arguments to the output buffer, and discard the residual
print statement with an empty list of arguments.

  rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output>
    [print]
  rule print(.Vals); => .  [structural]

Exceptions

SIMPLE allows parametric exceptions, in that one can throw and catch a
particular value. The statement try S₁ catch(X) S₂
proceeds with the evaluation of S₁. If S₁ evaluates normally, i.e.,
without any exception thrown, then S₂ is discarded and the execution
continues normally. If S₁ throws an exception with a statement of the
form throw E, then E is first evaluated to some value V
(throw was declared to be strict), then V is bound to X, then
S₂ is evaluated in the new environment while the reminder of S₁ is
discarded, then the environment is recovered and the execution continues
normally with the statement following the try S₁ catch(X) S₂ statement.
Exceptions can be nested and the statements in the
catch part (S₂ in our case) can throw exceptions to the
upper level. One should be careful with how one handles the control data
structures here, so that the abrupt changes of control due to exception
throwing and to function returns interact correctly with each other.
For example, we want to allow function calls inside the statement S₁ in
a try S₁ catch(X) S₂ block which can throw an exception
that is not caught by the function but instead is propagated to the
try S₁ catch(X) S₂ block that called the function.
Therefore, we have to make sure that the function stack as well as other
potential control structures are also properly modified when the exception
is thrown to correctly recover the execution context. This can be easily
achieved by pushing/popping the entire current control context onto the
exception stack. The three rules below modularly do precisely the above.

  syntax KItem ::= (Id,Stmt,K,Map,ControlCellFragment)

  syntax KItem ::= "popx"

  rule <k> (try S1 catch(X) {S2} => S1 ~> popx) ~> K </k>
       <control>
         <xstack> .List => ListItem((X, S2, K, Env, C)) ...</xstack>
         C
       </control>
       <env> Env </env>

  rule <k> popx => . ...</k>
       <xstack> ListItem(_) => .List ...</xstack>

  rule <k> throw V:Val; ~> _ => { var X = V; S2 } ~> K </k>
       <control>
         <xstack> ListItem((X, S2, K, Env, C)) => .List ...</xstack>
         (_ => C)
       </control>
       <env> _ => Env </env>

The catch statement S₂ needs to be executed in the original environment,
but where the thrown value V is bound to the catch variable X. We here
chose to rely on two previously defined constructs when giving semantics to
the catch part of the statement: (1) the variable declaration with
initialization, for binding X to V; and (2) the block construct for
preventing X from shadowing variables in the original environment upon the
completion of S₂.

Threads

SIMPLE's threads can be created and terminated dynamically, and can
synchronize by acquiring and releasing re-entrant locks and by rendezvous.
We discuss the seven rules giving the semantics of these operations below.

Thread creation

Threads can be created by any other threads using the spawn S
construct. The spawn expression construct evaluates to the unique identifier
of the newly created thread and, at the same time, a new thread cell is added
into the configuration, initialized with the S statement and sharing the
same environment with the parent thread. Note that the newly created
thread cell is torn. That means that the remaining cells are added
and initialized automatically as described in the definition of SIMPLE's
configuration. This is part of K's configuration abstraction mechanism.

  rule <thread>...
         <k> spawn S => !T:Int ...</k>
         <env> Env </env>
       ...</thread>
       (.Bag => <thread>...
               <k> S </k>
               <env> Env </env>
               <id> !T </id>
             ...</thread>)

Thread termination

Dually to the above, when a thread terminates its assigned computation (the
contents of its k cell) is empty, so the thread can be dissolved.
However, since no discipline is imposed on how locks are acquired and released,
it can be the case that a terminating thread still holds locks. Those locks
must be released, so other threads attempting to acquire them do not deadlock.
We achieve that by removing all the locks held by the terminating thread in its
holds cell from the set of busy locks in the busy cell
(keys(H) returns the domain of the map H as a set, that is, only
the locks themselves ignoring their multiplicity). As seen below, a lock is
added to the busy cell as soon as it is acquired for the first time
by a thread. The unique identifier of the terminated thread is also collected
into the terminated cell, so the join construct knows which
threads have terminated.

  rule (<thread>... <k>.</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag)
       <busy> Busy => Busy -Set keys(H) </busy>
       <terminated>... .Set => SetItem(T) ...</terminated>

Thread joining

Thread joining is now straightforward: all we need to do is to check whether
the identifier of the thread to be joined is in the terminated cell.
If yes, then the join statement dissolves and the joining thread
continues normally; if not, then the joining thread gets stuck.

  rule <k> join T:Int; => . ...</k>
       <terminated>... SetItem(T) ...</terminated>

Acquire lock

There are two cases to distinguish when a thread attempts to acquire a lock
(in SIMPLE any value can be used as a lock):
(1) The thread does not currently have the lock, in which case it has to
take it provided that the lock is not already taken by another thread (see
the side condition of the first rule).
(2) The thread already has the lock, in which case it just increments its
counter for the lock (the locks are re-entrant). These two cases are captured
by the two rules below:

  rule <k> acquire V:Val; => . ...</k>
       <holds>... .Map => V |-> 0 ...</holds>
       <busy> Busy (.Set => SetItem(V)) </busy>
    requires (notBool(V in Busy))  [acquire]

  rule <k> acquire V; => . ...</k>
       <holds>... V:Val |-> (N => N +Int 1) ...</holds>

Release lock

Similarly, there are two corresponding cases to distinguish when a thread
releases a lock:
(1) The thread holds the lock more than once, in which case all it needs to do
is to decrement the lock counter.
(2) The thread holds the lock only once, in which case it needs to remove it
from its holds cell and also from the the shared busy cell,
so other threads can acquire it if they need to.

  rule <k> release V:Val; => . ...</k>
       <holds>... V |-> (N => N -Int 1) ...</holds>
    requires N >Int 0

  rule <k> release V; => . ...</k> <holds>... V:Val |-> 0 => .Map ...</holds>
       <busy>... SetItem(V) => .Set ...</busy>

Rendezvous synchronization

In addition to synchronization through acquire and release of locks, SIMPLE
also provides a construct for rendezvous synchronization. A thread whose next
statement to execute is rendezvous(V) gets stuck until another
thread reaches an identical statement; when that happens, the two threads
drop their rendezvous statements and continue their executions. If three
threads happen to have an identical rendezvous statement as their next
statement, then precisely two of them will synchronize and the other will
remain blocked until another thread reaches a similar rendezvous statement.
The rule below is as simple as it can be. Note, however, that, again, it is
K's mechanism for configuration abstraction that makes it work as desired:
since the only cell which can multiply containing a k cell inside is
the thread cell, the only way to concretize the rule below to the
actual configuration of SIMPLE is to include each k cell in a
thread cell.

  rule <k> rendezvous V:Val; => . ...</k>
       <k> rendezvous V; => . ...</k>  [rendezvous]

Auxiliary declarations and operations

In this section we define all the auxiliary constructs used in the
above semantics.

Making declarations

The mkDecls auxiliary construct turns a list of identifiers
and a list of values in a sequence of corresponding variable
declarations.

  syntax Stmt ::= mkDecls(Ids,Vals)  [function]
  rule mkDecls((X:Id, Xs:Ids), (V:Val, Vs:Vals)) => var X=V; mkDecls(Xs,Vs)
  rule mkDecls(.Ids,.Vals) => {}

Location lookup

The operation below is straightforward. Note that we tag it with the same
lookup tag as the variable lookup rule defined above. This way,
both rules will be considered transitions when we include the lookup
tag in the transition option of kompile.

  syntax Exp ::= lookup(Int)
  rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store>  [lookup]

Environment recovery

We have already discussed the environment recovery auxiliary operation in the
IMP++ tutorial:

// TODO: eliminate the env wrapper, like we did in IMP++

  syntax KItem ::= setEnv(Map)
  rule <k> setEnv(Env) => . ...</k> <env> _ => Env </env>  [structural]

While theoretically sufficient, the basic definition for environment
recovery alone is suboptimal. Consider a loop while (E)S,
whose semantics (see above) was given by unrolling. S
is a block. Then the semantics of blocks above, together with the
unrolling semantics of the while loop, will yield a computation
structure in the k cell that increasingly grows, adding a new
environment recovery task right in front of the already existing sequence of
similar environment recovery tasks (this phenomenon is similar to the ``tail
recursion'' problem). Of course, when we have a sequence of environment
recovery tasks, we only need to keep the last one. The elegant rule below
does precisely that, thus avoiding the unnecessary computation explosion
problem:

  rule (setEnv(_) => .) ~> setEnv(_)  [structural]

In fact, the above follows a common convention in K for recovery
operations of cell contents: the meaning of a computation task of the form
cell(C) that reaches the top of the computation is that the current
contents of cell cell is discarded and gets replaced with C. We
did not add support for these special computation tasks in our current
implementation of K, so we need to define them as above.

lvalue and loc

For convenience in giving the semantics of constructs like the increment and
the assignment, that we want to operate the same way on variables and on
array elements, we used an auxiliary lvalue(E) construct which was
expected to evaluate to the lvalue of the expression E. This is only
defined when E has an lvalue, that is, when E is either a variable or
evaluates to an array element. lvalue(E) evaluates to a value of
the form loc(L), where L is the location where the value of E
can be found; for clarity, we use loc to structurally distinguish
natural numbers from location values. In giving semantics to lvalue
there are two cases to consider. (1) If E is a variable, then all we need
to do is to grab its location from the environment. (2) If E is an array
element, then we first evaluate the array and its index in order to identify
the exact location of the element of concern, and then return that location;
the last rule below works because its preceding context declarations ensure
that the array and its index are evaluated, and then the rule for array lookup
(defined above) rewrites the evaluated array access construct to its
corresponding store lookup operation.

// For parsing reasons, we prefer to allow lvalue to take a K

  syntax Exp ::= lvalue(K)
  syntax Val ::= loc(Int)

// Local variable

  rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env>
    [structural]

// Array element: evaluate the array and its index;
// then the array lookup rule above applies.

  context lvalue(_::Exp[HOLE::Exps])
  context lvalue(HOLE::Exp[_::Exps])

// Finally, return the address of the desired object member

  rule lvalue(lookup(L:Int) => loc(L))  [structural]

Initializing multiple locations

The following operation initializes a sequence of locations with the same
value:

  syntax Map ::= Int "..." Int "|->" K
    [function, latex({#1}\ldots{#2}\mapsto{#3})]
  rule N...M |-> _ => .Map  requires N >Int M
  rule N...M |-> K => N |-> K (N +Int 1)...M |-> K  requires N <=Int M

The semantics of SIMPLE is now complete. Make sure you kompile the
definition with the right options in order to generate the desired model.
No kompile options are needed if you only only want to execute the definition
(and thus get an interpreter), but if you want to search for a different
program behaviors then you need to kompile with the transition option
including rule tags such as lookup, increment, acquire, etc. See the
IMP++ tutorial for what the transition option means how to use it.

endmodule

Go to Lesson 2, SIMPLE typed static

SIMPLE — Typed — Static

Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign

Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest

Abstract

This is the K definition of the static semantics of the typed SIMPLE
language, or in other words, a type system for the typed SIMPLE
language in K. We do not re-discuss the various features of the
SIMPLE language here. The reader is referred to the untyped version of
the language for such discussions. We here only focus on the new and
interesting problems raised by the addition of type declarations, and
what it takes to devise a type system/checker for the language.

When designing a type system for a language, no matter in what
paradigm, we have to decide upon the intended typing policy. Note
that we can have multiple type systems for the same language, one for
each typing policy. For example, should we accept programs which
don't have a main function? Or should we allow functions that do not
return explicitly? Or should we allow functions whose type expects
them to return a value (say an int) to use a plain
return; statement, which returns no value, like in C?
And so on and so forth. Typically, there are two opposite tensions
when designing a type system. On the one hand, you want your type
system to be as permissive as possible, that is, to accept as many
programs that do not get stuck when executed with the untyped
semantics as possible; this will keep the programmers using your
language happy. On the other hand, you want your type system to have
a reasonable performance when implemented; this will keep both the
programmers and the implementers of your language happy. For example,
a type system for rejecting programs that could perform
division-by-zero is not expected to be feasible in general. A simple
guideline when designing typing policies is to imagine how the
semantics of the untyped language may get stuck and try to prevent
those situations from happening.

Before we give the K type system of SIMPLE formally, we discuss,
informally, the intended typing policy:

  • Each program should contain a main() function. Indeed,
    the untyped SIMPLE semantics will get stuck on any program which does
    not have a main function.

  • Each primitive value has its own type, which can be int
    bool, or string. There is also a type void
    for nonexistent values, for example for the result of a function meant
    to return no value (but only be used for its side effects, like a
    procedure).

  • The syntax of untyped SIMPLE is extended to allow type
    declarations for all the variables, including array variables. This is
    done in a C/Java-style. For example, int x; or
    int x=7, y=x+3;, or int[][][] a[10,20];
    (the latter defines a 10 × 20 matrix of arrays of integers).
    Recall from untyped SIMPLE that, unlike in C/Java, our multi-dimensional
    arrays use comma-separated arguments, although they have the array-of-array
    semantics.

  • Functions are also typed in a C/Java style. However, since in SIMPLE
    we allow functions to be passed to and returned by other functions, we also
    need function types. We will use the conventional higher-order arrow-notation
    for function types, but will separate the argument types with commas. For
    example, a function returning an array of bool elements and
    taking as argument an array x of two-integer-argument functions
    returning an integer, is declared using a syntax of the form
    bool[] f(((int,int)->int)[] x) { ... }
    and has the type ((int,int)->int)[] -> bool[].

  • We allow any variable declarations at the top level. Functions
    can only be declared at the top level. Each function can only access the
    other functions and variables declared at the top level, or its own locally
    declared variables. SIMPLE has static scoping.

  • The various expression and statement constructs take only elements of
    the expected types.

  • Increment and assignment can operate both on variables and on array
    elements. For example, if f has type int->int[][] and
    function g has the type int->int, then the
    increment expression ++f(7)[g(2),g(3)] is valid.

  • Functions should only return values of their declared result
    type. To give the programmers more flexibility, we allow functions to
    use return; statements to terminate without returning an
    actual value, or to not explicitly use any return statement,
    regardless of their declared return type. This flexibility can be
    handy when writing programs using certain functions only for their
    side effects. Nevertheless, as the dynamic semantics shows, a return
    value is automatically generated when an explicit return
    statement is not encountered.

  • For simplicity, we here limit exceptions to only throw and catch
    integer values. We let it as an exercise to the reader to extend the
    semantics to allow throwing and catching arbitrary-type exceptions.
    Like in programming languages like Java, one can go even further and
    define a semantics where thrown exceptions are propagated through
    try-catch statements until one of the corresponding type is found.
    We will do this when we define the KOOL language, not here.
    To keep the definition if SIMPLE simple, here we do not attempt to
    reject programs which throw uncaught exceptions.

Like in untyped SIMPLE, some constructs can be desugared into a
smaller set of basic constructs. In general, it should be clear why a
program does not type by looking at the top of the k cells in
its stuck configuration.

module SIMPLE-TYPED-STATIC-SYNTAX
  imports DOMAINS-SYNTAX

Syntax

The syntax of typed SIMPLE extends that of untyped SIMPLE with support
for declaring types to variables and functions.

  syntax Id ::= "main" [token]

Types

Primitive, array and function types, as well as lists (or tuples) of types.
The lists of types are useful for function arguments.

  syntax Type ::= "void" | "int" | "bool" | "string"
                | Type "[" "]"
                | "(" Type ")"             [bracket]
                > Types "->" Type

  syntax Types ::= List{Type,","}

Declarations

Variable and function declarations have the expected syntax. For variables,
we basically just replaced the var keyword of untyped SIMPLE with a
type. For functions, besides replacing the function keyword with a
type, we also introduce a new syntactic category for typed variables,
Param, and lists over it.

  syntax Param ::= Type Id
  syntax Params ::= List{Param,","}

  syntax Stmt ::= Type Exps ";"
                | Type Id "(" Params ")" Block

Expressions

The syntax of expressions is identical to that in untyped SIMPLE,
except for the logical conjunction and disjunction which have
different strictness attributes, because they now have different
evaluation strategies.

  syntax Exp ::= Int | Bool | String | Id
               | "(" Exp ")"             [bracket]
               | "++" Exp
               > Exp "[" Exps "]"        [strict]
               > Exp "(" Exps ")"        [strict]
               | "-" Exp                 [strict]
               | "sizeOf" "(" Exp ")"    [strict]
               | "read" "(" ")"
               > left:
                 Exp "*" Exp             [strict, left]
               | Exp "/" Exp             [strict, left]
               | Exp "%" Exp             [strict, left]
               > left:
                 Exp "+" Exp             [strict, left]
               | Exp "-" Exp             [strict, left]
               > non-assoc:
                 Exp "<" Exp             [strict, non-assoc]
               | Exp "<=" Exp            [strict, non-assoc]
               | Exp ">" Exp             [strict, non-assoc]
               | Exp ">=" Exp            [strict, non-assoc]
               | Exp "==" Exp            [strict, non-assoc]
               | Exp "!=" Exp            [strict, non-assoc]
               > "!" Exp                 [strict]
               > left:
                 Exp "&&" Exp            [strict, left]
               | Exp "||" Exp            [strict, left]
               > "spawn" Block
               > Exp "=" Exp             [strict(2), right]

Note that spawn has not been declared strict. This may
seem unexpected, because the child thread shares the same environment
with the parent thread, so from a typing perspective the spawned
statement makes the same sense in a child thread as it makes in the
parent thread. The reason for not declaring it strict is because we
want to disallow programs where the spawned thread calls the
return statement, because those programs would get stuck in
the dynamic semantics. The type semantics of spawn below will reject
such programs.

We still need lists of expressions, defined below, but note that we do
not need lists of identifiers anymore. They have been replaced by the lists
of parameters.

  syntax Exps ::= List{Exp,","}          [strict]

Statements

The statements have the same syntax as in untyped SIMPLE, except for
the exceptions, which now type their parameter. Note that, unlike in untyped
SIMPLE, all statement constructs which have arguments and are not desugared
are strict, including the conditional and the while. Indeed, from a
typing perspective, they are all strict: first type their arguments and then
type the actual construct.

  syntax Block ::= "{" "}"
                | "{" Stmt "}"

  syntax Stmt ::= Block
                | Exp ";"                                  [strict]
                | "if" "(" Exp ")" Block "else" Block      [avoid, strict]
                | "if" "(" Exp ")" Block
                | "while" "(" Exp ")" Block                [strict]
                | "for" "(" Stmt Exp ";" Exp ")" Block
                | "return" Exp ";"                         [strict]
                | "return" ";"
                | "print" "(" Exps ")" ";"                 [strict]
                | "try" Block "catch" "(" Param ")" Block  [strict(1