K is a rewrite-based
executable semantic framework in which programming languages, type
systems and formal analysis tools can be defined using configurations
and rules. Configurations organize the state in units called cells,
which are labeled and can be nested. K rewrite rules make it explicit
which parts of the term are read-only, write-only, read-write, or
unused. This makes K suitable for defining truly concurrent languages
even in the presence of sharing. Computations are represented as
syntactic extensions of the original language abstract syntax, using a
nested list structure which sequentializes computational tasks, such
as program fragments. Computations are like any other terms in a
rewriting environment: they can be matched, moved from one place to
another, modified, or deleted. This makes K suitable for defining
control-intensive features such as abrupt termination, exceptions, or
call/cc.
The purpose of this series of lessons is to teach developers how to program in
K. While the primary use of K is in the specification of operational semantics
of programming languages, this tutorial is agnostic on how the knowledge of K
is used. For a more detailed tutorial explaining the basic principles of
programming language design, refer to the
K PL Tutorial. Note that that tutorial is somewhat
out of date presently.
This K tutorial is a work in progress. Many lessons are currently simply
placeholders for future content.
To start the K tutorial, begin with
Section 1: Basic Programming in K.
The goal of this first section of the K tutorial is to teach the basic
principles of K to someone with no prior experience with K as a programming
language. However, this is not written with the intended audience of someone
who is a complete beginner to programming. We are assuming that the reader
has a firm grounding in computer science broadly, as well as that they have
experience writing code in functional programming languages before.
By the end of this section, the reader ought to be able to write specifications
of simple languages in K, use these specifications to generate a fast
interpreter for their programming language, as well as write basic deductive
program verification proofs over programs in their language. This should give
them the theoretical grounding they need to begin expanding their knowledge
of K in Section 2: Intermediate K Concepts.
To begin this section, refer to
Lesson 1.1: Setting up a K Environment.
The first step to learning K is to install K on your system, and configure your
editor for K development.
You have two options for how to install K, depending on how you intend to
interact with the K codebase. If you are solely a user of K, and have no
interest in developing or making changes to K, you most likely will want to
install one of our binary releases of K. However, if you are going to be a K
developer, or simply want to build K from source, you should follow the
instructions for a source build of K.
K is developed as a rolling release, with each change to K that passes our
CI infrastructure being deployed on GitHub for download. The latest release of
K can be downloaded here.
This page also contains information on how to install K. It is recommended
that you fully uninstall the old version of K prior to installing the new one,
as K does not maintain entries in package manager databases, with the exception
of Homebrew on MacOS.
You can clone K from GitHub with the following Git command:
git clone https://github.com/runtimeverification/k --recursive
Instructions on how to build K from source can be found
here.
K maintains a set of scripts for a variety of text editors, including vim and
emacs, in various states of maintenance. You can download these scripts with
the following Git command:
git clone https://github.com/kframework/k-editor-support
Because K allows users to define their own grammars for parsing K itself,
not all features of K can be effectively highlighted. However, at the cost of
occasionally highlighting things incorrectly, you can get some pretty good
results in many cases. With that being said, some of the editor scripts in the
above repository are pretty out of date. If you manage to improve them, we
welcome pull requests into the repository.
If you have problems installing K, we encourage you to reach out to us. If you
follow the above install instructions and run into a problem, you can
Create a bug report on GitHub
Once you have set up K on your system to your satisfaction, you can continue to
Lesson 1.2: Basics of Functional K.
The purpose of this lesson is to explain the basics of productions and
rules in K. These are two types of K sentences. A K file consists of
one or more requires or modules in K. Each module consists of one or
more imports or sentences. For more information on requires, modules, and
sentences, refer to Lesson 1.5. However, for the time
being, just think of a module as a container for sentences, and don't worry
about requires or imports just yet.
To start with, input the following program into your editor as file
lesson-02-a.k
:
module LESSON-02-A syntax Color ::= Yellow() | Blue() syntax Fruit ::= Banana() | Blueberry() syntax Color ::= colorOf(Fruit) [function] rule colorOf(Banana()) => Yellow() rule colorOf(Blueberry()) => Blue() endmodule
Save this file and then run:
kompile lesson-02-a.k
kompile
is K's compiler. By default, it takes a program or specification
written in K and compiles it into an interpreter for that input. Right now we
are compiling a single file. A set of K files that are compiled together are
called a K definition. We will cover multiple file K definitions later on.
kompile
will output a directory containing everything needed to execute
programs and perform proofs using that definition. In this case, kompile
will
(by default) create the directory lesson-02-a-kompiled
under the current
directory.
Now, save the following input file in your editor as banana.color
in the same
directory as lesson-02-a.k
:
colorOf(Banana())
We can now evaluate this K term by running (from the same directory):
krun banana.color
krun
will use the interpreter generated by the first call to kompile
to
execute this program.
You will get the following output:
<k>
Yellow ( ) ~> .
</k>
For now, don't worry about the <k>
, </k>
, or ~> .
portions of this
output file.
You can also execute small programs directly by specifying them on the command
line instead of putting them in a file. For example, the same program above
could also have been executed by running the following command:
krun -cPGM='colorOf(Banana())'
Now, let's look at what this definition and program did.
The first thing to realize is that this K definition contains 5 productions.
Productions are introduced with the syntax keyword, followed by a sort,
followed by the operator ::=
followed by the definition of one or more
productions themselves, separated by the |
operator. There are different
types of productions, but for now we only care about constructors and
functions. Each declaration separated by the |
operator is individually
a single production, and the |
symbol simply groups together productions that
have the same sort. For example, we could equally have written an identical K
definition lesson-02-b.k
like so:
module LESSON-02-B syntax Color ::= Yellow() syntax Color ::= Blue() syntax Fruit ::= Banana() syntax Fruit ::= Blueberry() syntax Color ::= colorOf(Fruit) [function] rule colorOf(Banana()) => Yellow() rule colorOf(Blueberry()) => Blue() endmodule
You can try compiling and running lesson-02-b.k
to see that it produces the same output as lesson-02-a.k
:
kompile lesson-02-b.k
krun -cPGM='colorOf(Banana())' --definition 'lesson-02-b-kompiled'
where the --definition
attribute points to the directory containing a compiled version of LESSON-02-B
.
Even the following definition is equivalent:
module LESSON-02-C syntax Color ::= Yellow() | Blue() | colorOf(Fruit) [function] syntax Fruit ::= Banana() | Blueberry() rule colorOf(Banana()) => Yellow() rule colorOf(Blueberry()) => Blue() endmodule
Each of these types of productions named above has the same underlying syntax,
but context and attributes are used to distinguish between the different
types. Tokens, brackets, lists, macros, aliases, and anywhere productions will
be covered in a later lesson, but this lesson does introduce us to constructors
and functions. Yellow()
, Blue()
, Banana()
, and Blueberry()
are
constructors. You can think of a constructor like a constructor for an
algebraic data type, if you're familiar with a functional language. The data
type itself is the sort that appears on the left of the ::=
operator. Sorts
in K consist of uppercase identifiers.
Constructors can have arguments, but these ones do not. We will cover the
syntax of productions in detail in the next lesson, but for now, you can write
a production with no arguments as an uppercase or lowercase identifier followed
by the ()
operator.
A function is distinguished from a constructor by the presence of the
function attribute. Attributes appear in a comma separated list between
square brackets after any sentence, including both productions and rules.
Various attributes with built-in meanings exist in K and will be discussed
throughout the tutorial.
Use krun
to compute the return value of the colorOf
function on a
Blueberry()
.
Functions in K are given definitions using rules. A rule begins with the rule
keyword and contains at least one rewrite operator. The rewrite operator
is represented by the syntax =>
. The rewrite operator is one of the built-in
productions in K, and we will discuss in more detail how it can be used in
future lessons, but for now, you can think of a rule as consisting of a
left-hand side and a right-hand side, separated by the rewrite
operator. On the left-hand side is the name of the function and zero or more
patterns corresponding to the parameters of the function. On the right-hand
side is another pattern. The meaning of the rule is relatively simple, having
defined these components. If the function is called with arguments that
match the patterns on the left-hand side, then the return value of the
function is the pattern on the right-hand side.
For example, in the above example, if the argument of the colorOf
function
is Banana()
, then the return value of the function is Yellow()
.
So far we have introduced that a constructor is a type of pattern in K. We
will introduce more complex patterns in later lessons, but there is one other
type of basic pattern: the variable. A variable, syntactically, consists
of an uppercase identifier. However, unlike a constructor, a variable will
match any pattern with one exception: Two variables with the same name
must match the same pattern.
Here is a more complex example (lesson-02-d.k
):
module LESSON-02-D syntax Container ::= Jar(Fruit) syntax Fruit ::= Apple() | Pear() syntax Fruit ::= contentsOfJar(Container) [function] rule contentsOfJar(Jar(F)) => F endmodule
Here we see that Jar
is a constructor with a single argument. You can write a
production with multiple arguments by putting the sorts of the arguments in a
comma-separated list inside the parentheses.
In this example, F
is a variable. It will match either Apple()
or Pear()
.
The return value of the function is created by substituting the matched
values of all of the variables into the variables on the right-hand side of
the rule.
To demonstrate, compile this definition and execute the following program with
krun:
contentsOfJar(Jar(Apple()))
You will see when you run it that the program returns Apple()
, because that
is the pattern that was matched by F
.
lesson-02-a.k
with the addition of blackberriescolorOf
function.Boolean
, with two constructors, true and false. Each of hat, shirt, pants,outfitMatching
function that will return true if allOnce you have completed the above exercises, you can continue to
Lesson 1.3: BNF Syntax and Parser Generation.
The purpose of this lesson is to explain the full syntax and semantics of
productions in K as well as how productions and other syntactic
sentences can be used to define grammars for use parsing both rules as well
as programs.
K's grammar is divided into two components: the outer syntax of K and the
inner syntax of K. Outer syntax refers to the parsing of requires,
modules, imports, and sentences in a K definition. Inner syntax
refers to the parsing of rules and programs. Unlike the outer syntax of
K, which is predetermined, much of the inner syntax of K is defined by you, the
developer. When rules or programs are parsed, they are parsed within the
context of a module. Rules are parsed in the context of the module in which
they exist, whereas programs are parsed in the context of the
main syntax module of a K definition. The productions and other syntactic
sentences in a module are used to construct the grammar of the module, which
is then used to perform parsing.
To illustrate how this works, we will consider a simple K definition which
defines a relatively basic calculator capable of evaluating Boolean expressions
containing and, or, not, and xor.
Input the following program into your editor as file lesson-03-a.k
:
module LESSON-03-A syntax Boolean ::= "true" | "false" | "!" Boolean [function] | Boolean "&&" Boolean [function] | Boolean "^" Boolean [function] | Boolean "||" Boolean [function] endmodule
You will notice that the productions in this file look a little different than
the ones from the previous lesson. In point of fact, K has two different
mechanisms for defining productions. We have previously been focused
exclusively on the first mechanism, where the ::=
symbol is followed by an
alphanumeric identifier followed by a comma-separated list of sorts in
parentheses. However, this is merely a special case of a more generic mechanism
for defining the syntax of productions using a variant of
BNF Form.
For example, in the previous lesson, we had the following set of productions:
module LESSON-03-B syntax Color ::= Yellow() | Blue() syntax Fruit ::= Banana() | Blueberry() syntax Color ::= colorOf(Fruit) [function] endmodule
It turns out that this is equivalent to the following definition which defines
the same grammar, but using BNF notation:
module LESSON-03-C syntax Color ::= "Yellow" "(" ")" | "Blue" "(" ")" syntax Fruit ::= "Banana" "(" ")" | "Blueberrry" "(" ")" syntax Color ::= "colorOf" "(" Fruit ")" [function] endmodule
In this example, the sorts of the argument to the function are unchanged, but
everything else has been wrapped in double quotation marks. This is because
in BNF notation, we distinguish between two types of production items:
terminals and non-terminals. A terminal represents simply a literal
string of characters that is verbatim part of the syntax of that production.
A non-terminal, conversely, represents a sort name, where the syntax of that
production accepts any valid term of that sort at that position.
This is why, when we wrote the program colorOf(Banana())
, krun
was able to
execute that program: because it represented a term of sort Color
that was
parsed and interpreted by K's interpreter. In other words, krun
parses and
interprets terms according to the grammar defined by the developer. It is
automatically converted into an AST of that term, and then the colorOf
function is evaluated using the function rules provided in the definition.
You can ask yourself: How does K match the strings between the double quotes?
The answer is that K uses Flex to generate a scanner for the grammar. Flex looks
for the longest possible match of a regular expression in the input. If there
are ambiguities between 2 or more regular expressions, it will pick the one with
the highest prec
attribute. You can learn more about how Flex matching works
here.
Bringing us back to the file lesson-03-a.k
, we can see that this grammar
has given a simple BNF grammar for expressions over Booleans. We have defined
constructors corresponding to the Boolean values true and false, and functions
corresponding to the Boolean operators for and, or, not, and xor. We have also
given a syntax for each of these functions based on their syntax in the C
programming language. As such, we can now write programs in the simple language
we have defined.
Input the following program into your editor as and.bool
in the same
directory:
true && false
We cannot interpret this program yet, because we have not given rules defining
the meaning of the &&
function yet, but we can parse it. To do this, you can
run (from the same directory):
kast --output kore and.bool
kast
is K's just-in-time parser. It will generate a grammar from your K
definition on the fly and use it to parse the program passed on the command
line. The --output
flag controls how the resulting AST is represented; don't
worry about the possible values yet, just use kore
.
You ought to get the following AST printed on standard output, minus the
formatting:
inj{SortBoolean{}, SortKItem{}}(
Lbl'UndsAnd-And-UndsUnds'LESSON-03-A'Unds'Boolean'Unds'Boolean'Unds'Boolean{}(
Lbltrue'Unds'LESSON-03-A'Unds'Boolean{}(),
Lblfalse'Unds'LESSON-03-A'Unds'Boolean{}()
)
)
Don't worry about what exactly this means yet, just understand that it
represents the AST of the program that you just parsed. You ought to be able
to recognize the basic shape of it by seeing the words true
, false
, and
And
in there. This is Kore, the intermediate representation of K, and we
will cover it in detail later.
Note that you can also tell kast
to print the AST in other formats. For a
more direct representation of the original K, while still maintaining the
structure of an AST, you can say kast --output kast and.bool
. This will
yield the following output:
`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(
`true_LESSON-03-A_Boolean`(.KList),
`false_LESSON-03-A_Boolean`(.KList)
)
Note how the first output is largely a name-mangled version of the second
output. The one difference is the presence of the inj
symbol in the KORE
output. We will talk more about this in later lessons.
Parse the expression false || true
with --output kast
. See if you can
predict approximately what the corresponding output would be with
--output kore
, then run the command yourself and compare it to your
prediction.
Now let's try a slightly more advanced example. Input the following program
into your editor as and-or.bool
:
true && false || false
When you try and parse this program, you ought to see the following error:
[Error] Inner Parser: Parsing ambiguity.
1: syntax Boolean ::= Boolean "||" Boolean [function]
`_||__LESSON-03-A_Boolean_Boolean_Boolean`(`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)),`false_LESSON-03-A_Boolean`(.KList))
2: syntax Boolean ::= Boolean "&&" Boolean [function]
`_&&__LESSON-03-A_Boolean_Boolean_Boolean`(`true_LESSON-03-A_Boolean`(.KList),`_||__LESSON-03-A_Boolean_Boolean_Boolean`(`false_LESSON-03-A_Boolean`(.KList),`false_LESSON-03-A_Boolean`(.KList)))
Source(./and-or.bool)
Location(1,1,1,23)
This error is saying that kast
was unable to parse this program because it is
ambiguous. K's just-in-time parser is a GLL parser, which means it can handle
the full generality of context-free grammars, including those grammars which
are ambiguous. An ambiguous grammar is one where the same string can be parsed
as multiple distinct ASTs. In this example, it can't decide whether it should
be parsed as (true && false) || false
or as true && (false || false)
. As a
result, it reports the error to the user.
Currently there is no way of resolving this ambiguity, making it impossible
to write complex expressions in this language. This is obviously a problem.
The standard solution in most programming languages to this problem is to
use parentheses to indicate the appropriate grouping. K generalizes this notion
into a type of production called a bracket. A bracket production in K
is any production with the bracket
attribute. It is required that such a
production only have a single non-terminal, and the sort of the production
must equal the sort of that non-terminal. However, K does not otherwise
impose restrictions on the grammar the user provides for a bracket. With that
being said, the most common type of bracket is one in which a non-terminal
is surrounded by terminals representing some type of bracket such as
()
, []
, {}
, <>
, etc. For example, we can define the most common
type of bracket, the type used by the vast majority of programming languages,
quite simply.
Consider the following modified definition, which we will save to
lesson-03-d.k
:
module LESSON-03-D syntax Boolean ::= "true" | "false" | "(" Boolean ")" [bracket] | "!" Boolean [function] | Boolean "&&" Boolean [function] | Boolean "^" Boolean [function] | Boolean "||" Boolean [function] endmodule
In this definition, if the user does not explicitly define parentheses, the
grammar remains ambiguous and K's just-in-time parser will report an error.
However, you are now able to parse more complex programs by means of explicitly
grouping subterms with the bracket we have just defined.
Consider and-or-left.bool
:
(true && false) || false
Now consider and-or-right.bool
:
true && (false || false)
If you parse these programs with kast
, you will once again get a single
unique AST with no error. If you look, you might notice that the bracket itself
does not appear in the AST. In fact, this is a property unique to brackets:
productions with the bracket attribute are not represented in the parsed AST
of a term, and the child of the bracket is folded immediately into the parent
term. This is the reason for the requirement that a bracket production have
a single non-terminal of the same sort as the production itself.
Write out what you expect the AST to be arising from parsing these two programs
above with --output kast
, then parse them yourself and compare them to the
AST you expected. Confirm for yourself that the bracket production does not
appear in the AST.
So far we have seen how we can define the grammar of a language. However,
the grammar is not the only relevant part of parsing a language. Also relevant
is the lexical syntax of the language. Thus far, we have implicitly been using
K's automatic lexer generation to generate a token in the scanner for each
terminal in our grammar. However, sometimes we wish to define more complex
lexical syntax. For example, consider the case of integers in C: an integer
consists of a decimal, octal, or hexadecimal number followed by an optional
suffix indicating the type of the literal.
In theory it would be possible to define this syntax via a grammar, but not
only would it be cumbersome and tedious, you would also then have to deal with
an AST generated for the literal which is not convenient to work with.
Instead of doing this, K allows you to define token productions, where
a production consists of a regular expression followed by the token
attribute, and the resulting AST consists of a typed string containing the
value recognized by the regular expression.
For example, the builtin integers in K are defined using the following
production:
syntax Int ::= r"[\\+\\-]?[0-9]+" [token]
Here we can see that we have defined that an integer is an optional sign
followed by a nonzero sequence of digits. The r
preceding the terminal
indicates that what appears inside the double quotes is a regular expression,
and the token
attribute indicates that terms which parse as this production
should be converted into a token by the parser.
It is also possible to define tokens that do not use regular expressions. This
can be useful when you wish to declare particular identifiers for use in your
semantics later. For example:
syntax Id ::= "main" [token]
Here, we declare that main
is a token of sort Id
. Instead of being parsed
as a symbol, it gets parsed as a token, generating a typed string in the AST.
This is useful in a semantics of C because the parser generally does not treat
the main
function in C specially; only the semantics treats it specially.
Of course, languages can have more complex lexical syntax. For example, if we
wish to define the syntax of integers in C, we could use the following
production:
syntax IntConstant ::= r"(([1-9][0-9]*)|(0[0-7]*)|(0[xX][0-9a-fA-F]+))(([uU][lL]?)|([uU]((ll)|(LL)))|([lL][uU]?)|(((ll)|(LL))[uU]?))?" [token]
As you may have noted above, long and complex regular expressions
can be hard to read. They also suffer from the problem that unlike a grammar,
they are not particularly modular.
We can get around this restriction by declaring explicit regular expressions,
giving them a name, and then referring to them in productions.
Consider the following (equivalent) way to define the lexical syntax of
integers in C:
syntax IntConstant ::= r"({DecConstant}|{OctConstant}|{HexConstant})({IntSuffix}?)" [token]
syntax lexical DecConstant = r"{NonzeroDigit}({Digit}*)"
syntax lexical OctConstant = r"0({OctDigit}*)"
syntax lexical HexConstant = r"{HexPrefix}({HexDigit}+)"
syntax lexical HexPrefix = r"0x|0X"
syntax lexical NonzeroDigit = r"[1-9]"
syntax lexical Digit = r"[0-9]"
syntax lexical OctDigit = r"[0-7]"
syntax lexical HexDigit = r"[0-9a-fA-F]"
syntax lexical IntSuffix = r"{UnsignedSuffix}({LongSuffix}?)|{UnsignedSuffix}{LongLongSuffix}|{LongSuffix}({UnsignedSuffix}?)|{LongLongSuffix}({UnsignedSuffix}?)"
syntax lexical UnsignedSuffix = r"[uU]"
syntax lexical LongSuffix = r"[lL]"
syntax lexical LongLongSuffix = r"ll|LL"
As you can see, this is rather more verbose, but it has the benefit of both
being much easier to read and understand, and also increased modularity.
Note that we refer to a named regular expression by putting the name in curly
brackets. Note also that only the first sentence actually declares a new piece
of syntax in the language. When the user writes syntax lexical
, they are only
declaring a regular expression. To declare an actual piece of syntax in the
grammar, you still must actually declare an explicit token production.
One final note: K uses Flex to implement
its lexical analysis. As a result, you can refer to the
Flex Manual
for a detailed description of the regular expression syntax supported. Note
that for performance reasons, Flex's regular expressions are actually a regular
language, and thus lack some of the syntactic convenience of modern
"regular expression" libraries. If you need features that are not part of the
syntax of Flex regular expressions, you are encouraged to express them via
a grammar instead.
So far we have been entirely focused on K's support for just-in-time parsing,
where the parser is generated on the fly prior to being used. This benefits
from being faster to generate the parser, but it suffers in performance if you
have to repeatedly parse strings with the same parser. For this reason, it is
generally encouraged that when parsing programs, you use K's ahead-of-time
parser generation. K makes use of
GNU Bison to generate parsers.
By default, you can enable ahead-of-time parsing via the --gen-bison-parser
flag to kompile
. This will make use of Bison's LR(1) parser generator. As
such, if your grammar is not LR(1), it may not parse exactly the same as if
you were to use the just-in-time parser, because Bison will automatically pick
one of the possible branches whenever it encounters a shift-reduce or
reduce-reduce conflict. In this case, you can either modify your grammar to be
LR(1), or you can enable use of Bison's GLR support by instead passing
--gen-glr-bison-parser
to kompile
. Note that if your grammar is ambiguous,
the ahead-of-time parser will not provide you with particularly readable error
messages at this time.
If you have a K definition named foo.k
, and it generates a directory when
you run kompile
called foo-kompiled
, you can invoke the ahead-of-time
parser you generated by running foo-kompiled/parser_PGM <file>
on a file.
Compile lesson-03-d.k
with ahead-of-time parsing enabled. Then compare
how long it takes to run kast --output kore and-or-left.bool
with how long it
takes to run lesson-03-d-kompiled/parser_PGM and-or-left.bool
. Confirm for
yourself that both produce the same result, but that the latter is faster.
Define a simple grammar consisting of integers, brackets, addition,
subtraction, multiplication, division, and unary negation. Integers should be
in decimal form and lexically without a sign, whereas negative numbers can be
represented via unary negation. Ensure that you are able to parse some basic
arithmetic expressions using a generated ahead-of-time parser. Do not worry
about disambiguating the grammar or about writing rules to implement the
operations in this definition.
Write a program where the meaning of the arithmetic expression based on
the grammar you defined above is ambiguous, and then write programs that
express each individual intended meaning using brackets.
Once you have completed the above exercises, you can continue to
Lesson 1.4: Disambiguating Parses.
The purpose of this lesson is to teach how to use K's builtin features for
disambiguation to transform an ambiguous grammar into an unambiguous one that
expresses the intended ASTs.
In practice, very few formal languages outside the domain of natural language
processing are ambiguous. The main reason for this is that parsing unambiguous
languages is asymptotically faster than parsing ambiguous languages.
Programming language designers instead usually use the notions of operator
precedence and associativity to make expression grammars unambiguous. These
mechanisms work by instructing the parser to reject certain ASTs in favor of
others in case of ambiguities; it is often possible to remove all ambiguities
in a grammar with these techniques.
While it is sometimes possible to explicitly rewrite the grammar to remove
these parses, because K's grammar specification and AST generation are
inextricably linked, this is generally discouraged. Instead, we use the
approach of explicitly expressing the relative precedence of different
operators in different situations in order to resolve the ambiguity.
For example, in C, &&
binds tighter in precedence than ||
, meaning that
the expression true && false || false
has only one valid AST:
(true && false) || false
.
Consider, then, the third iteration on the grammar of this definition
(lesson-04-a.k
):
module LESSON-04-A syntax Boolean ::= "true" | "false" | "(" Boolean ")" [bracket] > "!" Boolean [function] > Boolean "&&" Boolean [function] > Boolean "^" Boolean [function] > Boolean "||" Boolean [function] endmodule
In this example, some of the |
symbols separating productions in a single
block have been replaced with >
. This serves to describe the
priority groups associated with this block of productions.
The first priority group consists of the atoms of the
language: true
, false
, and the bracket operator. In general, a priority
group starts either at the ::=
or >
operator and extends until either the
next >
operator or the end of the production block. Thus, we can see that the
second, third, fourth, and fifth priority groups in this grammar all consist
of a single production.
The meaning of these priority groups becomes apparent when parsing programs:
A symbol with a lesser priority, (i.e., one that binds looser), cannot
appear as the direct child of a symbol with a greater priority (i.e.,
one that binds tighter. In this case, the >
operator can be seen as a
greater-than operator describing a transitive partial ordering on the
productions in the production block, expressing their relative priority.
To see this more concretely, let's look again at the program
true && false || false
. As noted before, previously this program was
ambiguous because the parser could either choose that &&
was the child of ||
or vice versa. However, because a symbol with lesser priority (i.e., ||
)
cannot appear as the direct child of a symbol with greater priority
(i.e., &&
), the parser will reject the parse where ||
is under the
&&
operator. As a result, we are left with the unambiguous parse
(true && false) || false
. Similarly, true || false && false
parses
unambiguously as true || (false && false)
. Conversely, if the user explicitly
wants the other parse, they can express this using brackets by explicitly
writing true && (false || false)
. This still parses successfully because the
||
operator is no longer the direct child of the &&
operator, but is
instead the direct child of the ()
operator, and the &&
operator is an
indirect parent, which is not subject to the priority restriction.
Astute readers, however, will already have noticed what seems to be a
contradiction: we have defined ()
as also having greater priority than ||
.
One would think that this should mean that ||
cannot appear as a direct
child of ()
. This is a problem because priority groups are applied to every
possible parse separately. That is to say, even if the term is unambiguous
prior to this disambiguation rule, we still reject that parse if it violates
the rule of priority.
In fact, however, we do not reject this program as a parse error. Why is that?
Well, the rule for priority is slightly more complex than previously described.
In actual fact, it applies only conditionally. Specifically, it applies in
cases where the child is either the first or last production item in the
parent's production. For example, in the production Bool "&&" Bool
, the
first Bool
non-terminal is not preceded by any terminals, and the last Bool
non-terminal is not followed by any terminals. As a result of this, we apply
the priority rule to both children of &&
. However, in the ()
operator,
the sole non-terminal is both preceded by and followed by terminals. As a
result, the priority rule is not applied when ()
is the parent. Because of
this, the program we mentioned above successfully parses.
Parse the program true && false || false
using kast, and confirm that the AST
places ||
as the top level symbol. Then modify the definition so that you
will get the alternative parse.
Even having broken the expression grammar into priority blocks, the resulting
grammar is still ambiguous. We can see this if we try to parse the following
program (assoc.bool
):
true && false && false
Priority blocks will not help us here: the problem comes between two parses
where both possible parses have a direct parent and child which is within a
single priority block (in this case, &&
is in the same block as itself).
This is where the notion of associativity comes into play. Associativity
applies the following additional rules to parses:
In C, binary operators are all left-associative, meaning that the expression
true && false && false
parses unambiguously as (true && false) && false
,
because &&
cannot appear as the rightmost child of itself.
Consider, then, the fourth iteration on the grammar of this definition
(lesson-04-b.k
):
module LESSON-04-B syntax Boolean ::= "true" | "false" | "(" Boolean ")" [bracket] > "!" Boolean [function] > left: Boolean "&&" Boolean [function] > left: Boolean "^" Boolean [function] > left: Boolean "||" Boolean [function] endmodule
Here each priority group, immediately after the ::=
or >
operator, can
be followed by a symbol representing the associativity of that priority group:
either left:
for left associativity, right:
for right associativity, or
non-assoc:
for non-associativity. In this example, each priority group we
apply associativity to has only a single production, but we could equally well
write a priority block with multiple productions and an associativity.
For example, consider the following, different grammar (lesson-04-c.k
):
module LESSON-04-C syntax Boolean ::= "true" | "false" | "(" Boolean ")" [bracket] > "!" Boolean [function] > left: Boolean "&&" Boolean [function] | Boolean "^" Boolean [function] | Boolean "||" Boolean [function] endmodule
In this example, unlike the one above, &&
, ^
, and ||
have the same
priority. However, viewed as a group, the entire group is left associative.
This means that none of &&
, ^
, and ||
can appear as the right child of
any of &&
, ^
, or ||
. As a result of this, this grammar is also not
ambiguous. However, it expresses a different grammar, and you are encouraged
to think about what the differences are in practice.
Parse the program true && false && false
yourself, and confirm that the AST
places the rightmost &&
at the top of the expression. Then modify the
definition to generate the alternative parse.
Previously we have only considered the case where all of the productions
which you wish to express a priority or associativity relation over are
co-located in the same block of productions. However, in practice this is not
always feasible or desirable, especially as a definition grows in size across
multiple modules.
As a result of this, K provides a second way of declaring priority and
associativity relations.
Consider the following grammar, which we will name lesson-04-d.k
and which
will express the exact same grammar as lesson-04-b.k
module LESSON-04-D syntax Boolean ::= "true" [group(literal)] | "false" [group(literal)] | "(" Boolean ")" [group(atom), bracket] | "!" Boolean [group(not), function] | Boolean "&&" Boolean [group(and), function] | Boolean "^" Boolean [group(xor), function] | Boolean "||" Boolean [group(or), function] syntax priority literal atom > not > and > xor > or syntax left and syntax left xor syntax left or endmodule
This introduces a couple of new features of K. First, the group(_)
attribute
is used to conceptually group together sets of sentences under a common
user-defined name. For example, literal
in the syntax priority
sentence is
used to refer to all the productions marked with the group(literal)
attribute,
i.e., true
and false
. A production can belong to multiple groups using
syntax such as group(myGrp1,myGrp2)
.
Once we understand this, it becomes relatively straightforward to understand
the meaning of this grammar. Each syntax priority
sentence defines a
priority relation where >
separates different priority groups. Each priority
group is defined by a list of one or more group names, and consists of all
productions which are members of at least one of those named groups.
In the same way, a syntax left
, syntax right
, or syntax non-assoc
sentence
defines an associativity relation among left-, right-, or non-associative
groups. Specifically, this means that:
syntax left a b
is different to:
syntax left a
syntax left b
As a consequence of this, syntax [left|right|non-assoc]
should not be used to
group together labels with different priority.
Sometimes priority and associativity prove insufficient to disambiguate a
grammar. In particular, sometimes it is desirable to be able to choose between
two ambiguous parses directly while still not rejecting any parses if the term
parsed is unambiguous. A good example of this is the famous "dangling else"
problem in imperative C-like languages.
Consider the following definition (lesson-04-E.k
):
module LESSON-04-E syntax Exp ::= "true" | "false" syntax Stmt ::= "if" "(" Exp ")" Stmt | "if" "(" Exp ")" Stmt "else" Stmt | "{" "}" endmodule
We can write the following program (dangling-else.if
):
if (true) if (false) {} else {}
This is ambiguous because it is unclear whether the else
clause is part of
the outer if
or the inner if
. At first we might try to resolve this with
priorities, saying that the if
without an else
cannot appear as a child of
the if
with an else
. However, because the non-terminal in the parent symbol
is both preceded and followed by a terminal, this will not work.
Instead, we can resolve the ambiguity directly by telling the parser to
"prefer" or "avoid" certain productions when ambiguities arise. For example,
when we parse this program, we see the following ambiguity as an error message:
[Error] Inner Parser: Parsing ambiguity.
1: syntax Stmt ::= "if" "(" Exp ")" Stmt
`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(`true_LESSON-04-E_Exp`(.KList),`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(`false_LESSON-04-E_Exp`(.KList),`;_LESSON-04-E_Stmt`(.KList),`;_LESSON-04-E_Stmt`(.KList)))
2: syntax Stmt ::= "if" "(" Exp ")" Stmt "else" Stmt
`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(`true_LESSON-04-E_Exp`(.KList),`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(`false_LESSON-04-E_Exp`(.KList),`;_LESSON-04-E_Stmt`(.KList)),`;_LESSON-04-E_Stmt`(.KList))
Source(./dangling-else.if)
Location(1,1,1,30)
Roughly, we see that the ambiguity is between an if
with an else
or an if
without an else
. Since we want to pick the first parse, we can tell K to
"avoid" the second parse with the avoid
attribute. Consider the following
modified definition (lesson-04-f.k
):
module LESSON-04-F syntax Exp ::= "true" | "false" syntax Stmt ::= "if" "(" Exp ")" Stmt | "if" "(" Exp ")" Stmt "else" Stmt [avoid] | "{" "}" endmodule
Here we have added the avoid
attribute to the else
production. As a result,
when an ambiguity occurs and one or more of the possible parses has that symbol
at the top of the ambiguous part of the parse, we remove those parses from
consideration and consider only those remaining. The prefer
attribute behaves
similarly, but instead removes all parses which do not have that attribute.
In both cases, no action is taken if the parse is not ambiguous.
Parse the program if (true) if (false) {} else {}
using lesson-04-f.k
and confirm that else clause is part of the innermost if
statement. Then
modify the definition so that you will get the alternative parse.
Modify your solution from Lesson 1.3, Exercise 2 so that unary negation should
bind tighter than multiplication and division, which should bind tighter than
addition and subtraction, and each binary operator should be left associative.
Write these priority and associativity declarations explicitly, and then
try to write them inline.
Write a simple grammar containing at least one ambiguity that cannot be
resolved via priority or associativity, and then use the prefer
attribute to
resolve that ambiguity.
Explain why the following grammar is not labeled ambiguous by the K parser when parsing abb
, then make the parser realize the ambiguity.
module EXERCISE4 syntax Expr ::= "a" Expr "b" | "abb" | "b" endmodule
Once you have completed the above exercises, you can continue to
Lesson 1.5: Modules, Imports, and Requires.
The purpose of this lesson is to explain how K definitions can be broken into
separate modules and files and how these distinct components combine into a
complete K definition.
Recall from Lesson 1.3 that K's grammar is broken
into two components: the outer syntax of K and the inner syntax of K.
Outer syntax, as previously mentioned, consists of requires, modules,
imports, and sentences. A K semantics is expressed by the set of
sentences contained in the definition. The scope of what is considered
contained in that definition is determined both by the main semantics
module of a K definition, as well as the requires and imports present
in the file that contains that module.
The basic unit of grouping sentences in K is the module. A module consists
of a module name, an optional list of attributes, a list of
imports, and a list of sentences.
A module name consists of one or more groups of letters, numbers, or
underscores, separated by a hyphen. Here are some valid module names: FOO
,
FOO-BAR
, foo0
, foo0_bar-Baz9
. Here are some invalid module names: -
,
-FOO
, BAR-
, FOO--BAR
. Stylistically, modules names are usually all
uppercase with hyphens separating words, but this is not strictly enforced.
Some example modules include an empty module:
module LESSON-05-A endmodule
A module with some attributes:
module LESSON-05-B [group(attr1,attr2), private] endmodule
A module with some sentences:
module LESSON-05-C syntax Boolean ::= "true" | "false" syntax Boolean ::= "not" Boolean [function] rule not true => false rule not false => true endmodule
Thus far we have only discussed definitions containing a single module.
Definitions can also contain multiple modules, in which one module imports
others.
An import in K appears at the top of a module, prior to any sentences. It can
be specified with the imports
keyword, followed by a module name.
For example, here is a simple definition with two modules (lesson-05-d.k
):
module LESSON-05-D-1 syntax Boolean ::= "true" | "false" syntax Boolean ::= "not" Boolean [function] endmodule module LESSON-05-D imports LESSON-05-D-1 rule not true => false rule not false => true endmodule
This K definition is equivalent to the definition expressed by the single module
LESSON-05-C
. Essentially, by importing a module, we include all of the
sentences in the module being imported into the module that we import from.
There are a few minor differences between importing a module and simply
including its sentences in another module directly, but we will cover these
differences later. Essentially, you can think of modules as a way of
conceptually grouping sentences in a larger K definition.
Modify lesson-05-d.k
to include four modules: one containing the syntax, two
with one rule each that imports the first module, and a final module
LESSON-05-D
containing no sentences that imports the second and third module.
Check to make sure the definition still compiles and that you can still evaluate
the not
function.
As you may have noticed, each module in a definition can express a distinct set
of syntax. When parsing the sentences in a module, we use the syntax
of that module, enriched with the basic syntax of K, in order to parse
rules in that module. For example, the following definition is a parser error
(lesson-05-e.k
):
module LESSON-05-E-1
rule not true => false
rule not false => true
endmodule
module LESSON-05-E-2
syntax Boolean ::= "true" | "false"
syntax Boolean ::= "not" Boolean [function]
endmodule
This is because the syntax referenced in module LESSON-05-E-1
, namely, not
,
true
, and false
, is not imported by that module. You can solve this problem
by simply importing the modules containing the syntax you want to use in your
sentences.
When we are compiling a K definition, we need to know where to start. We
designate two specific entry point modules: the main syntax module
and the main semantics module. The main syntax module, as well as all the
modules it imports recursively, are used to create the parser for programs that
you use to parse programs that you execute with krun
. The main semantics
module, as well as all the modules it imports recursively, are used to
determine the rules that can be applied at runtime in order to execute a
program. For example, in the above example, if the main semantics module is
module LESSON-05-D-1
, then not
is an uninterpreted function (i.e., has no
rules associated with it), and the rules in module LESSON-05-D
are not
included.
While you can specify the entry point modules explicitly by passing the
--main-module
and --syntax-module
flags to kompile
, by default, if you
type kompile foo.k
, then the main semantics module will be FOO
and the
main syntax module will be FOO-SYNTAX
.
So far, while we have discussed ways to break definitions into separate
conceptual components (modules), K also provides a mechanism for combining
multiple files into a single K definition, namely, the requires directive.
In K, the requires
keyword has two meanings. The first, the requires
statement, appears at the top of a K file, prior to any module declarations. It
consists of the keyword requires
followed by a double-quoted string. The
second meaning of the requires
keyword will be covered in a later lesson,
but it is distinguished because the second case occurs only inside modules.
The string passed to the requires statement contains a filename. When you run
kompile
on a file, it will look at all of the requires
statements in that
file, look up those files on disk, parse them, and then recursively process all
the requires statements in those files. It then combines all the modules in all
of those files together, and uses them collectively as the set of modules to
which imports
statements can refer.
Putting it all together, here is one possible way in which we could break the
definition lesson-02-c.k
from Lesson 1.2 into
multiple files and modules:
colors.k
:
module COLORS syntax Color ::= Yellow() | Blue() endmodule
fruits.k
:
module FRUITS syntax Fruit ::= Banana() | Blueberry() endmodule
colorOf.k
:
requires "fruits.k"
requires "colors.k"
module COLOROF-SYNTAX
imports COLORS
imports FRUITS
syntax Color ::= colorOf(Fruit) [function]
endmodule
module COLOROF
imports COLOROF-SYNTAX
rule colorOf(Banana()) => Yellow()
rule colorOf(Blueberry()) => Blue()
endmodule
You would then compile this definition with kompile colorOf.k
and use it the
same way as the original, single-module definition.
Modify the name of the COLOROF
module, and then recompile the definition.
Try to understand why you now get a compiler error. Then, resolve this compiler
error by passing the --main-module
and --syntax-module
flags to kompile.
One note can be made about how paths are resolved in requires
statements.
By default, the path you specify is allowed to be an absolute or a relative
path. If the path is absolute, that exact file is imported. If the path is
relative, a matching file is looked for within all of the
include directories specified to the compiler. By default, the include
directories include the current working directory, followed by the
include/kframework/builtin
directory within your installation of K. You can
also pass one or more directories to kompile
via the -I
command line flag,
in which case these directories are prepended to the beginning of the list.
Take the solution to Lesson 1.4, Exercise 2 which included the explicit
priority and associativity declarations, and modify the definition so that
the syntax of integers and brackets is in one module, the syntax of addition,
subtraction, and unary negation is in another module, and the syntax of
multiplication and division is in a third module. Make sure you can still parse
the same set of expressions as before. Place priority declarations in the main
module.
Modify lesson-02-d.k
from Lesson 1.2 so that the rules and syntax are in
separate modules in separate files.
Place the file containing the syntax from Exercise 2 in another directory,
then recompile the definition. Observe why a compilation error occurs. Then
fix the compiler error by passing -I
to kompile.
Once you have completed the above exercises, you can continue to
Lesson 1.6: Integers and Booleans.
The purpose of this lesson is to explain the two most basic types of builtin
sorts in K, the Int
sort and the Bool
sort, representing
arbitrary-precision integers and Boolean algebra.
K provides definitions of some useful sorts in
domains.md, found in the
include/kframework/builtin
directory of the K installation. This file is
defined via a
Literate programming
style that we will discuss in a future lesson. We will not cover all of the
sorts found there immediately, however, this lesson discusses some of the
details surrounding integers and Booleans, as well as providing information
about how to look up more detailed knowledge about builtin functions in K's
documentation.
The most basic builtin sort K provides is the Bool
sort, representing
Boolean values (i.e., true
and false
). You have already seen how we were
able to create this type ourselves using K's parsing and disambiguation
features. However, in the vast majority of cases, we prefer instead to import
the version of Boolean algebra defined by K itself. Most simply, you can do
this by importing the module BOOL
in your definition. For example
(lesson-06-a.k
):
module LESSON-06-A imports BOOL syntax Fruit ::= Blueberry() | Banana() syntax Bool ::= isBlue(Fruit) [function] rule isBlue(Blueberry()) => true rule isBlue(Banana()) => false endmodule
Here we have defined a simple predicate, i.e., a function returning a
Boolean value. We are now able to perform the usual Boolean operations of
and, or, and not over these values. For example (lesson-06-b.k
):"
module LESSON-06-B imports BOOL syntax Fruit ::= Blueberry() | Banana() syntax Bool ::= isBlue(Fruit) [function] rule isBlue(Blueberry()) => true rule isBlue(Banana()) => false syntax Bool ::= isYellow(Fruit) [function] | isBlueOrYellow(Fruit) [function] rule isYellow(Banana()) => true rule isYellow(Blueberry()) => false rule isBlueOrYellow(F) => isBlue(F) orBool isYellow(F) endmodule
In the above example, Boolean inclusive or is performed via the orBool
function, which is defined in the BOOL
module. As a matter of convention,
many functions over builtin sorts in K are suffixed with the name of the
primary sort over which those functions are defined. This happens so that the
syntax of K does not (generally) conflict with the syntax of any other
programming language, which would make it harder to define that programming
language in K.
Write a function isBlueAndNotYellow
which computes the appropriate Boolean
expression. If you are unsure what the appropriate syntax is to use, you
can refer to the BOOL
module in
domains.md. Add a term of
sort Fruit
for which isBlue
and isYellow
both return true, and test that
the isBlueAndNotYellow
function behaves as expected on all three Fruit
s.
For most sorts in domains.md
, K defines more than one module that can be
imported by users. For example, for the Bool
sort, K defines the BOOL
module that has previously already been discussed, but also provides the
BOOL-SYNTAX
module. This module, unlike the BOOL
module, only declares the
values true
and false
, but not any of the functions that operate over the
Bool
sort. The rationale is that you may want to import this module into the
main syntax module of your definition in some cases, whereas you generally do
not want to do this with the version of the module that includes all the
functions over the Bool
sort. For example, if you were defining the semantics
of C++, you might import BOOL-SYNTAX
into the syntax module of your
definition, because true
and false
are part of the grammar of C++, but
you would only import the BOOL
module into the main semantics module, because
C++ defines its own syntax for and, or, and not that is different from the
syntax defined in the BOOL
module.
Here, for example, is how we might redefine our Boolean expression calculator
to use the Bool
sort while maintaining an idiomatic structure of modules
and imports, for the first time including the rules to calculate the values of
expressions themselves (lesson-06-c.k
):
module LESSON-06-C-SYNTAX imports BOOL-SYNTAX syntax Bool ::= "(" Bool ")" [bracket] > "!" Bool [function] > left: Bool "&&" Bool [function] | Bool "^" Bool [function] | Bool "||" Bool [function] endmodule module LESSON-06-C imports LESSON-06-C-SYNTAX imports BOOL rule ! B => notBool B rule A && B => A andBool B rule A ^ B => A xorBool B rule A || B => A orBool B endmodule
Note the encapsulation of syntax: the LESSON-06-C-SYNTAX
module contains
exactly the syntax of our Boolean expressions, and no more, whereas any other
syntax needed to implement those functions is in the LESSON-06-C
module
instead.
Add an "implies" function to the above Boolean expression calculator, using the
->
symbol to represent implication. You can look up K's builtin "implies"
function in the BOOL
module in domains.md
.
Unlike most programming languages, where the most basic integer type is a
fixed-precision integer type, the most commonly used integer sort in K is
the Int
sort, which represents the mathematical integers, ie,
arbitrary-precision integers.
K provides three main modules for import when using the Int
sort. The first,
containing all the syntax of integers as well as all of the functions over
integers, is the INT
module. The second, which provides just the syntax
of integer literals themselves, is the INT-SYNTAX
module. However, unlike
most builtin sorts in K, K also provides a third module for the Int
sort:
the UNSIGNED-INT-SYNTAX
module. This module provides only the syntax of
non-negative integers, i.e., natural numbers. The reasons for this involve
lexical ambiguity. Generally speaking, in most programming languages, -1
is
not a literal, but instead a literal to which the unary negation operator is
applied. K thus provides this module to ease in specifying the syntax of such
languages.
For detailed information about the functions available over the Int
sort,
refer to domains.md
. Note again how we append Int
to the end of most of the
integer operations to ensure they do not collide with the syntax of other
programming languages.
Extend your solution from Lesson 1.4, Exercise 2 to implement the rules
that define the behavior of addition, subtraction, multiplication, and
division. Do not worry about the case when the user tries to divide by zero
at this time. Use /Int
to implement division. Test your new calculator
implementation by executing the arithmetic expressions you wrote as part of
Lesson 1.3, Exercise 2. Check to make sure each computes the value you expected.
Combine the Boolean expression calculator from this lesson with your
solution to Exercise 1, and then extend the combined calculator with the <
,
<=
, >
, >=
, ==
, and !=
expressions. Write some Boolean expressions
that combine integer and Boolean operations, and test to ensure that these
expressions return the expected truth value.
Compute the following expressions using your solution from Exercise 2:
7 / 3
, 7 / -3
, -7 / 3
, -7 / -3
. Then replace the /Int
function in
your definition with divInt
instead, and observe how the value of the above
expressions changes. Why does this occur?
Once you have completed the above exercises, you can continue to
Lesson 1.7: Side Conditions and Rule Priority.
The purpose of this lesson is to explain how to write conditional rules in K,
and to explain how to control the order in which rules are tried.
So far, all of the rules we have discussed have been unconditional rules.
If the left-hand side of the rule matches the arguments to the function, the
rule applies. However, there is another type of rule, a conditional rule.
A conditional rule consists of a rule body containing the patterns to
match, and a side condition representing a Boolean expression that must
evaluate to true in order for the rule to apply.
Side conditions in K are introduced via the requires
keyword immediately
following the rule body. For example, here is a rule with a side condition
(lesson-07-a.k
):
module LESSON-07-A imports BOOL imports INT syntax Grade ::= "letter-A" | "letter-B" | "letter-C" | "letter-D" | "letter-F" | gradeFromPercentile(Int) [function] rule gradeFromPercentile(I) => letter-A requires I >=Int 90 endmodule
In this case, the gradeFromPercentile
function takes a single integer
argument. The function evaluates to letter-A
if the argument passed is
greater than 90. Note that the side condition is allowed to refer to variables
that appear on the left-hand side of the rule. In the same manner as variables
appearing on the right-hand side, variables that appear in the side condition
evaluate to the value that was matched on the left-hand side. Then the
functions in the side condition are evaluated, which returns a term of sort
Bool
. If the term is equal to true
, then the rule applies. Bear in mind
that the side condition is only evaluated at all if the patterns on the
left-hand side of the rule match the term being evaluated.
Write a rule that evaluates gradeFromPercentile
to letter-B
if the argument
to the function is in the range [80,90). Test that the function correctly
evaluates various numbers between 80 and 100.
owise
RulesSo far, all the rules we have introduced have had the same priority. What
this means is that K does not necessarily enforce an order in which the rules
are tried. We have only discussed functions so far in K, so it is not
immediately clear why this choice was made, given that a function is not
considered well-defined if multiple rules for evaluating it are capable of
evaluating the same arguments to different results. However, in future lessons
we will discuss other types of rules in K, some of which can be
non-deterministic. What this means is that if more than one rule is capable
of matching, then K will explore both possible rules in parallel, and consider
each of their respective results when executing your program. Don't worry too
much about this right now, but just understand that because of the potential
later for nondeterminism, we don't enforce a total ordering on the order in
which rules are attempted to be applied.
However, sometimes this is not practical; It can be very convenient to express
that a particular rule applies if no other rules for that function are
applicable. This can be expressed by adding the owise
attribute to a rule.
What this means, in practice, is that this rule has lower priority than other
rules, and will only be tried to be applied after all the other,
higher-priority rules have been tried and they have failed.
For example, in the above exercise, we had to add a side condition containing
two Boolean comparisons to the rule we wrote to handle letter-B
grades.
However, in practice this meant that we compare the percentile to 90 twice. We
can more efficiently and more idiomatically write the letter-B
case for the
gradeFromPercentile
rule using the owise
attribute (lesson-07-b.k
):
module LESSON-07-B imports BOOL imports INT syntax Grade ::= "letter-A" | "letter-B" | "letter-C" | "letter-D" | "letter-F" | gradeFromPercentile(Int) [function] rule gradeFromPercentile(I) => letter-A requires I >=Int 90 rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [owise] endmodule
This rule is saying, "if all the other rules do not apply, then the grade is a
B if the percentile is greater than or equal to 80." Note here that we use both
a side condition and an owise
attribute on the same rule. This is not
required (as we will see later), but it is allowed. What this means is that the
side condition is only tried if the other rules did not apply and the
left-hand side of the rule matched. You can even use more complex matching on
the left-hand side than simply a variable. More generally, you can also have
multiple higher-priority rules, or multiple owise
rules. What this means in
practice is that all of the non-owise
rules are tried first, in any order,
followed by all the owise
rules, in any order.
The grades D
and F
correspond to the percentile ranges [60, 70) and [0, 60)
respectively. Write another implementation of gradeFromPercentile
which
handles only these cases, and uses the owise
attribute to avoid redundant
Boolean comparisons. Test that various percentiles in the range [0, 70) are
evaluated correctly.
As it happens, the owise
attribute is a specific case of a more general
concept we call rule priority. In essence, each rule is assigned an integer
priority. Rules are tried in increasing order of priority, starting with a
rule with priority zero, and trying each increasing numerical value
successively.
By default, a rule is assigned a priority of 50. If the rule has the owise
attribute, it is instead given the priority 200. You can see why this will
cause owise
rules to be tried after regular rules.
However, it is also possible to directly assign a numerical priority to a rule
via the priority
attribute. For example, here is an alternative way
we could express the same two rules in the gradeFromPercentile
function
(lesson-07-c.k
):
module LESSON-07-C imports BOOL imports INT syntax Grade ::= "letter-A" | "letter-B" | "letter-C" | "letter-D" | "letter-F" | gradeFromPercentile(Int) [function] rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)] rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(200)] endmodule
We can, of course, assign a priority equal to any non-negative integer. For
example, here is a more complex example that handles the remaining grades
(lesson-07-d.k
):
module LESSON-07-D imports BOOL imports INT syntax Grade ::= "letter-A" | "letter-B" | "letter-C" | "letter-D" | "letter-F" | gradeFromPercentile(Int) [function] rule gradeFromPercentile(I) => letter-A requires I >=Int 90 [priority(50)] rule gradeFromPercentile(I) => letter-B requires I >=Int 80 [priority(51)] rule gradeFromPercentile(I) => letter-C requires I >=Int 70 [priority(52)] rule gradeFromPercentile(I) => letter-D requires I >=Int 60 [priority(53)] rule gradeFromPercentile(_) => letter-F [priority(54)] endmodule
Note that we have introduced a new piece of syntax here: _
. This is actually
just a variable. However, as a special case, when a variable is named _
, it
does not bind a value that can be used on the right-hand side of the rule, or
in a side condition. Effectively, _
is a placeholder variable that means "I
don't care about this term."
In this example, we have explicitly expressed the order in which the rules of
this function are tried. Since rules are tried in increasing numerical
priority, we first try the rule with priority 50, then 51, then 52, 53, and
finally 54.
As a final note, remember that if you assign a rule a priority higher than 200,
it will be tried after a rule with the owise
attribute, and if you assign
a rule a priority less than 50, it will be tried before a rule with no
explicit priority.
Write a function isEven
that returns whether an integer is an even number.
Use two rules and one side condition. The right-hand side of the rules should
be Boolean literals. Refer back to
domains.md for the relevant
integer operations.
Modify the calculator application from Lesson 1.6, Exercise 2, so that division
by zero will no longer make krun
crash with a "Divison by zero" exception.
Instead, the /
function should not match any of its rules if the denominator
is zero.
Write your own implementation of ==
, <
, <=
, >
, >=
for integers and modify your solution from Exercise 2 to use it.
You can use any arithmetic operations in the INT
module, but do not use any built-in boolean functions for comparing integers.
Hint: Use pattern matching and recursive definitions with rule priorities.
Once you have completed the above exercises, you can continue to
Lesson 1.8: Literate Programming with Markdown.
The purpose of this lesson is to teach a paradigm for performing literate
programming in K, and explain how this can be used to create K definitions
that are also documentation.
The K tutorial so far has been written in
Markdown. Markdown,
for those not already familiar, is a lightweight plain-text format for styling
text. From this point onward, we assume you are familiar with Markdown and how
to write Markdown code. You can refer to the above link for a tutorial if you
are not already familiar.
What you may not necessarily realize, however, is that the K tutorial is also
a sequence of K definitions written in the manner of
Literate Programming.
For detailed information about Literate Programming, you can read the linked
Wikipedia article, but the short summary is that literate programming is a way
of intertwining documentation and code together in a manner that allows
executable code to also be, simultaneously, a documented description of that
code.
K is provided with built-in support for literate programming using Markdown.
By default, if you pass a file with the .md
file extension to kompile
, it
will look for any code blocks containing k code in that file, extract out
that K code into pure K, and then compile it as if it were a .k
file.
A K code block begins with a line of text containing the keyword ```k
,
and ends when it encounters another ```
keyword.
For example, if you view the markdown source of this document, this is a K
code block:
module LESSON-08 imports INT
Only the code inside K code blocks will actually be sent to the compiler. The
rest, while it may appear in the document when rendered by a markdown viewer,
is essentially a form of code comment.
When you have multiple K code blocks in a document, K will append each one
together into a single file before passing it off to the outer parser.
For example, the following code block contains sentences that are part of the
LESSON-08
module that we declared the beginning of above:
syntax Int ::= Int "+" Int [function] rule I1 + I2 => I1 +Int I2
Compile this file with kompile README.md --main-module LESSON-08
. Confirm
that you can use the resulting compiled definition to evaluate the +
function.
On occasion, you may want to generate multiple K definitions from a single
Markdown file. You may also wish to include a block of syntax-highlighted K
code that nonetheless does not appear as part of your K definition. It is
possible to accomplish this by means of the built-in support for syntax
highlighting in Markdown. Markdown allows a code block that was begun with
```
to be immediately followed by a string which is used to signify what
programming language the following code is written in. However, this feature
actually allows arbitrary text to appear describing that code block. Markdown
parsers are able to parse this text and render the code block differently
depending on what text appears after the backticks.
In K, you can use this functionality to specify one or more
Markdown selectors which are used to describe the code block. A Markdown
selector consists of a sequence of characters containing letters, numbers, and
underscores. A code block can be designated with a single selector by appending
the selector immediately following the backticks that open the code block.
For example, here is a code block with the foo
selector:
foo bar
Note that this is not K code. By convention, K code should have the k
selector on it. You can express multiple selectors on a code block by putting
them between curly braces and prepending each with the .
character. For
example, here is a code block with the foo
and k
selectors:
syntax Int ::= foo(Int) [function]
rule foo(0) => 0
Because this code block contains the k
Markdown selector, by default it is
included as part of the K definition being compiled.
Confirm this fact by using krun
to evaluate foo(0)
.
By default, as previously stated, K includes in the definition any code block
with the k
selector. However, this is merely a specific instance of a general
principle, namely, that K allows you to control which selectors get included
in your K definition. This is done by means of the --md-selector
flag to
kompile
. This flag accepts a Markdown selector expression, which you
can essentially think of as a kind of Boolean algebra over Markdown selectors.
Each selector becomes an atom, and you can combine these atoms via the &
,
|
, !
, and ()
operators.
Here is a grammar, written in K, of the language of Markdown selector
expressions:
syntax Selector ::= r"[0-9a-zA-Z_]+" [token]
syntax SelectorExp ::= Selector
| "(" SelectorExp ")" [bracket]
> right:
"!" SelectorExp
> right:
SelectorExp "&" SelectorExp
> right:
SelectorExp "|" SelectorExp
Here is a selector expression that selects all the K code blocks in this
definition except the one immediately above:
k & (! selector)
This code block exists in order to make the above lesson a syntactically valid
K definition. Consider why it is necessary.
endmodule
Compile this lesson with the selector expression k & (! foo)
and confirm
that you get a parser error if you try to evaluate the foo
function with the
resulting definition.
Compile Lesson 1.3
as a K definition. Identify why it fails to compile. Then pass an appropriate
--md-selector
to the compiler in order to make it compile.
Modify your calculator application from Lesson 1.7, Exercise 2, to be written
in a literate style. Consider what text might be appropriate to turn the
resulting markdown file into documentation for your calculator.
Once you have completed the above exercises, you can continue to
Lesson 1.9: Unparsing and the format and color attributes.
The purpose of this lesson is to teach the user about how terms are
pretty-printed in K, and how the user can make adjustments to the default
settings for how to print specific terms.
When you use krun
to interpret a program, the tool passes through three major
phases. In the first, parsing, the program itself is parsed using either kast
or an ahead-of-time parser generated via Bison, and the resulting AST becomes
the input to the interpreter. In the second phase, execution, K evaluates
functions and (as we will discuss in depth later) performs rewrite steps to
iteratively transform the program state. The third and final phase is called
unparsing, because it consists of taking the final state of the application
after the program has been interpreted, and converting it from an AST back into
text that (in theory, anyway) could be parsed back into the same AST that was
the output of the execution phase.
In practice, parsing is not always precisely reversible. It turns out
(although we are not going to cover exactly why this is here), that
constructing a sound algorithm that takes a grammar and an AST and emits text
that could be parsed via that grammar to the original AST is an
NP-hard problem. As a result, in the interests of avoiding exponential time
algorithms when users rarely care about unparsing being completely sound, we
take certain shortcuts that provide a linear-time algorithm that approximates
a sound solution to the problem while sacrificing the notion that the result
can be parsed into the exact original term in all cases.
This is a lot of theoretical explanation, but at root, the unparsing process
is fairly simple: it takes a K term that is the output of execution and pretty
prints it according to the syntax defined by the user in their K definition.
This is useful because the original AST is not terribly user-readable, and it
is difficult to visualize the entire term or decipher information about the
final state of the program at a quick glance. Of course, in rare cases, the
pretty-printed configuration loses information of relevance, which is why K
allows you to obtain the original AST on request.
As an example of all of this, consider the following K definition
(lesson-09-a.k
):
module LESSON-09-A imports BOOL syntax Exp ::= "(" Exp ")" [bracket] | Bool > "!" Exp > left: Exp "&&" Exp | Exp "^" Exp | Exp "||" Exp syntax Exp ::= id(Exp) [function] rule id(E) => E endmodule
This is similar to the grammar we defined in LESSON-06-C
, with the difference
that the Boolean expressions are now constructors of sort Exp
and we define a
trivial function over expressions that returns its argument unchanged.
We can now parse a simple program in this definition and use it to unparse some
Boolean expressions. For example (exp.bool
):
id(true&&false&&!true^(false||true))
Here is a program that is not particularly legible at first glance, because all
extraneous whitespace has been removed. However, if we run krun exp.bool
, we
see that the result of the unparser will pretty-print this expression rather
nicely:
<k>
true && false && ! true ^ ( false || true ) ~> .
</k>
Notably, not only does K insert whitespace where appropriate, it is also smart
enough to insert parentheses where necessary in order to ensure the correct
parse. For example, without those parentheses, the expression above would parse
equivalent to the following one:
(((true && false) && ! true) ^ false) || true
Indeed, you can confirm this by passing that exact expression to the id
function and evaluating it, then looking at the result of the unparser:
<k>
true && false && ! true ^ false || true ~> .
</k>
Here, because the meaning of the AST is the same both with and without
parentheses, K does not insert any parentheses when unparsing.
Modify the grammar of LESSON-09-A
above so that the binary operators are
right associative. Try unparsing exp.bool
again, and note how the result is
different. Explain the reason for the difference.
You may have noticed that right now, the unparsing of terms is not terribly
imaginative. All it is doing is taking each child of the term, inserting it
into the non-terminal positions of the production, then printing the production
with a space between each terminal or non-terminal. It is easy to see why this
might not be desirable in some cases. Consider the following K definition
(lesson-09-b.k
):
module LESSON-09-B imports BOOL syntax Stmt ::= "{" Stmt "}" | "{" "}" > right: Stmt Stmt | "if" "(" Bool ")" Stmt | "if" "(" Bool ")" Stmt "else" Stmt [avoid] endmodule
This is a statement grammar, simplified to the point of meaninglessness, but
still useful as an object lesson in unparsing. Consider the following program
in this grammar (if.stmt
):
if (true) {
if (true) {}
if (false) {}
if (true) {
if (false) {} else {}
} else {
if (false) {}
}
}
This is how that term would be unparsed if it appeared in the output of krun:
if ( true ) { if ( true ) { } if ( false ) { } if ( true ) { if ( false ) { } else { } } else { if ( false ) { } } }
This is clearly much less legible than we started with! What are we to do?
Well, K provides an attribute, format
, that can be applied to any production,
which controls how that production gets unparsed. You've seen how it gets
unparsed by default, but via this attribute, the developer has complete control
over how the term is printed. Of course, the user can trivially create ways to
print terms that would not parse back into the same term. Sometimes this is
even desirable. But in most cases, what you are interested in is controlling
the line breaking, indentation, and spacing of the production.
Here is an example of how you might choose to apply the format
attribute
to improve how the above term is unparsed (lesson-09-c.k
):
module LESSON-09-C imports BOOL syntax Stmt ::= "{" Stmt "}" [format(%1%i%n%2%d%n%3)] | "{" "}" [format(%1%2)] > right: Stmt Stmt [format(%1%n%2)] | "if" "(" Bool ")" Stmt [format(%1 %2%3%4 %5)] | "if" "(" Bool ")" Stmt "else" Stmt [avoid, format(%1 %2%3%4 %5 %6 %7)] endmodule
If we compile this new definition and unparse the same term, this is the
result we get:
if (true) {
if (true) {}
if (false) {}
if (true) {
if (false) {} else {}
} else {
if (false) {}
}
}
This is the exact same text we started with! By adding the format
attributes,
we were able to indent the body of code blocks, adjust the spacing of if
statements, and put each statement on a new line.
How exactly was this achieved? Well, each time the unparser reaches a term,
it looks at the format
attribute of that term. That format
attribute is a
mix of characters and format codes. Format codes begin with the %
character. Each character in the format
attribute other than a format code is
appended verbatim to the output, and each format code is handled according to
its meaning, transformed (possibly recursively) into a string of text, and
spliced into the output at the position the format code appears in the format
string.
Provided for reference is a table with a complete list of all valid format
codes, followed by their meaning:
Format Code | Meaning |
---|---|
n | Insert '\n' followed by the current indentation level |
i | Increase the current indentation level by 1 |
d | Decrease the current indentation level by 1 |
c | Move to the next color in the list of colors for this production (see next section) |
r | Reset color to the default foreground color for the terminal (see next section) |
an integer | Print a terminal or non-terminal from the
production. The integer is treated as a 1-based
index into the terminals and non-terminals of
the production.
If the offset refers to a terminal, move to the next color in the list of colors for this production, print the value of that terminal, then reset the color to the default foreground color for the terminal. If the offset refers to a regular expression terminal, it is an error. If the offset refers to a non-terminal, unparse the corresponding child of the current term (starting with the current indentation level) and print the resulting text, then set the current color and indentation level to the color and indentation level following unparsing that term. |
other char | Print that character verbatim |
Change the format attributes for LESSON-09-C
so that if.stmt
will unparse
as follows:
if (true)
{
if (true)
{
}
if (false)
{
}
if (true)
{
if (false)
{
}
else
{
}
}
else
{
if (false)
{
}
}
}
When the output of unparsing is displayed on a terminal supporting colors, K
is capable of coloring the output, similar to what is possible with a syntax
highlighter. This is achieved via the color
and colors
attributes.
Essentially, both the color
and colors
attributes are used to construct a
list of colors associated with each production, and then the format attribute
is used to control how those colors are used to unparse the term. At its most
basic level, you can set the color
attribute to color all the terminals in
the production a certain color, or you can use the colors
attribute to
specify a comma-separated list of colors for each terminal in the production.
At a more advanced level, the %c
and %r
format codes control how the
formatter interacts with the list of colors specified by the colors
attribute. You can essentially think of the color
attribute as a way of
specifying that you want all the colors in the list to be the same color.
Note that the %c
and %r
format codes are relatively primitive in nature.
The color
and colors
attributes merely maintain a list of colors, whereas
the %c
and %r
format codes merely control how to advance through that list
and how individual text is colored.
It is an error if the colors
attribute does not provide all the colors needed
by the terminals and escape codes in the production. %r
does not change the
position in the list of colors at all, so the next %c
will advance to the
following color.
As a complete example, here is a variant of LESSON-09-A which colors the
various boolean operators:
module LESSON-09-D imports BOOL syntax Exp ::= "(" Exp ")" [bracket] | Bool > "!" Exp [color(yellow)] > left: Exp "&&" Exp [color(red)] | Exp "^" Exp [color(blue)] | Exp "||" Exp [color(green)] syntax Exp ::= id(Exp) [function] rule id(E) => E endmodule
For a complete list of allowed colors, see
here.
Use the color attribute on LESSON-09-C
to color the keywords true
and
false
one color, the keywords if
and else
another color, and the operators
(
, )
, {
, and }
a third color.
Use the format
, color
, and colors
attributes to tell the unparser to
style the expression grammar from Lesson 1.8, Exercise 3 according to your own
personal preferences for syntax highlighting and code formatting. You can
view the result of the unparser on a function term without evaluating that
function by means of the command kparse <file> | kore-print -
.
Once you have completed the above exercises, you can continue to
Lesson 1.10: Strings.
The purpose of this lesson is to explain how to use the String
sort in K to
represent sequences of characters, and explain where to find additional
information about builtin functions over strings.
String
SortIn addition to the Int
and Bool
sorts covered in
Lesson 1.6, K provides, among others, the
String
sort to represent sequences of characters. You can import this
functionality via the STRING-SYNTAX
module, which contains the syntax of
string literals in K, and the STRING
module, which contains all the functions
that operate over the String
type.
Strings in K are double-quoted. The following list of escape sequences is
supported:
Escape Sequence | Meaning |
---|---|
\" |
The literal character " |
\\ |
The literal character \ |
\n |
The newline character (ASCII code 0x0a) |
\r |
The carriage return character (ASCII code 0x0d) |
\t |
The tab character (ASCII code 0x09) |
\f |
The form feed character (ASCII code 0x0c) |
\x00 |
\x followed by 2 hexadecimal digits indicates a code point between 0x00 and 0xFF |
\u0000 |
\u followed by 4 hexadecimal digits indicates a code point between 0x0000 and 0xFFFF |
\U00000000 |
\U followed by 8 hexadecimal digits indicates a code point between 0x000000 and 0x10FFFF |
Please note that as of the current moment, K's unicode support is not fully
complete, so you may run into errors using code points greater than 0xff.
As an example, you can construct a string literal containing the following
block of text:
This is an example block of text.
Here is a quotation: "Hello world."
This line is indented.
ÁÉÍÓÚ
Like so:
"This is an example block of text.\nHere is a quotation: \"Hello world.\"\n\tThis line is indented.\n\xc1\xc9\xcd\xd3\xda\n"
The full list of functions provided for the String
sort can be found in
domains.md, but here we
describe a few of the more basic ones.
The concatenation operator for strings is +String
. For example, consider
the following K rule that constructs a string from component parts
(lesson-10.k
):
module LESSON-10 imports STRING syntax String ::= msg(String) [function] rule msg(S) => "The string you provided: " +String S +String "\nHave a nice day!" endmodule
Note that this operator is O(N)
, so repeated concatenations are inefficient.
For information about efficient string concatenation, refer to
Lesson 2.14.
The function to return the length of a string is lengthString
. For example,
lengthString("foo")
will return 3, and lengthString("")
will return 0.
The return value is the length of the string in code points.
The function to compute the substring of a string is substrString
. It
takes two string indices, starting from 0, and returns the substring within the
range [start..end). It is only defined if end >= start
, start >= 0
, and
end <= length of string
. Here, for example, we return the first 5 characters
of a string:
substrString(S, 0, 5)
Here we return all but the first 3 characters:
substrString(S, 3, lengthString(S))
domains.md
.Once you have completed the above exercises, you can continue to
Lesson 1.11: Casting Terms.
The purpose of this lesson is to explain how to use cast expressions in
order to disambiguate terms using sort information. We also explain how the
variable sort inference algorithm works in K, and how to change the default
behavior by casting variables to a particular sort.
Sometimes the grammar you write for your rules in K can be a little bit
ambiguous on purpose. While grammars for programming languages may be
unambiguous when considered in their entirety, K allows you to write rules
involving arbitrary fragments of that grammar, and those fragments can
sometimes be ambiguous by themselves, or similar enough to other fragments
of the grammar to trigger ambiguity. As a result, in addition to the tools
covered in Lesson 1.4, K provides one
additional powerful tool for disambiguation: cast expressions.
K provides three main types of casts: the semantic cast, the strict cast, and
the projection cast. We will cover each of them, and their similarities and
differences, in turn.
The most basic, and most common, type of cast in K is called the
semantic cast. For every sort S
declared in a module, K provides the
following (implicit) production for use in sentences:
syntax S ::= S ":S"
Note that S
simply represents the name of the sort. For example, if we
defined a sort Exp
, the actual production for that sort would be:
syntax Exp ::= Exp ":Exp"
At runtime, this expression will not actually exist; it is merely an annotation
to the compiler describing the sort of the term inside the cast. It is telling
the compiler that the term inside the cast must be of sort Exp
. For example,
if we had the following grammar:
module LESSON-11-A imports INT syntax Exp ::= Int | Exp "+" Exp syntax Stmt ::= "if" "(" Exp ")" Stmt | "{" "}" endmodule
Then we would be able to write 1:Exp
, or (1 + 2):Exp
, but not {}:Exp
.
You can also restrict the sort that a variable in a rule will match by casting
it. For example, consider the following additional module:
module LESSON-11-B imports LESSON-11-A imports BOOL syntax Term ::= Exp | Stmt syntax Bool ::= isExpression(Term) [function] rule isExpression(_E:Exp) => true rule isExpression(_) => false [owise] endmodule
Here we have defined a very simple function that decides whether a term is
an expression or a statement. It does this by casting the variable inside the
isExpression
rule to sort Exp
. As a result, that variable will only match terms
of sort Exp
. Thus, isExpression(1)
will return true, as will isExpression(1 + 2)
, but
isExpression({})
will return false.
Verify this fact for yourself by running isExpression
on the above examples. Then
write an isStatement
function, and test that it works as expected.
On occasion, a semantic cast is not strict enough. It might be that you want
to, for disambiguation purposes, say exactly what sort a term is. For
example, consider the following definition:
module LESSON-11-C imports INT syntax Exp ::= Int | "add[" Exp "," Exp "]" [group(exp)] syntax Exp2 ::= Exp | "add[" Exp2 "," Exp2 "]" [group(exp2)] endmodule
This grammar is a little ambiguous and contrived, but it serves to demonstrate
how a semantic cast might be insufficient to disambiguate a term. If we were
to write the term add[ I1:Int , I2:Int ]:Exp2
, the term would be ambiguous,
because the cast is not sufficiently strict to determine whether you mean
to derive the "add" production defined in group exp
or the one in group exp2
.
In this situation, there is a solution: the strict cast. For every sort
S
in your grammar, K also defines the following production:
syntax S ::= S "::S"
This may at first glance seem the same as the previous cast. And indeed,
from the perspective of the grammar and from the perspective of rewriting,
they are in fact identical. However, the second variant has a unique meaning
in the type system of K: namely, the term inside the cast cannot be a
subsort, i.e., a term of another sort S2
such that the production
syntax S ::= S2
exists.
As a result, if we were to write in the above grammar the term
add[ I1:Int , I2:Int ]::Exp2
, then we would know that the second derivation above
should be chosen, whereas if we want the first derivation, we could write
add[ I1:Int , I2:Int ]::Exp
.
Care must be taken when using a strict cast with brackets. For example, consider a
similar grammar but using an infix "+":
module LESSON-11-D imports INT syntax Exp ::= Int | Exp "+" Exp [group(exp)] syntax Exp2 ::= Exp | Exp2 "+" Exp2 [group(exp2)] | "(" Exp2 ")" [bracket] endmodule
The term I1:Int + I2:Int
is ambiguous and could refer to either the production
in group exp
or the one in group exp2
. To differentiate, you might try to write
(I1:Int + I2:Int)::Exp2
similarly to the previous example.
Unfortunately though, this is still ambiguous. Here, the strict cast ::Exp2
applies
directly to the brackets themselves rather than the underlying term within those brackets.
As a result, it enforces that (I1:Int + I2:Int)
cannot be a strict subsort of Exp2
, but
it has no effect on the sort of the subterm I1:Int + I2:Int
.
For cases like this, K provides an alternative syntax for strict casts:
syntax S ::= "{" S "}::S"
The ambiguity can then be resolved with {I1:Int + I2:Int}::Exp
or {I1:Int + I2:Int}::Exp2
.
Thus far we have focused entirely on casts which exist solely to inform the
compiler about the sort of terms. However, sometimes when dealing with grammars
containing subsorts, it can be desirable to reason with the subsort production
itself, which injects one sort into another. Remember from above that such
a production looks like syntax S ::= S2
. This type of production, called a
subsort production, can be thought of as a type of inheritance involving
constructors. If we have the above production in our grammar, we say that S2
is a subsort of S
, or that any S2
is also an S
. K implicitly maintains a
symbol at runtime which keeps track of where such subsortings occur; this
symbol is called an injection.
Sometimes, when one sort is a subsort of another, it can be the case that
a function returns one sort, but you actually want to cast the result of
calling that function to another sort which is a subsort of the first sort.
This is similar to what happens with inheritance in an object-oriented
language, where you might cast a superclass to a subclass if you know for
sure the object at runtime is in fact an instance of that class.
K provides something similar for subsorts: the projection cast.
For each pair of sorts S
and S2
, K provides the following production:
syntax S ::= "{" S2 "}" ":>S"
What this means is that you take any term of sort S2
and cast it to sort
S
. If the term of sort S2 consists of an injection containing a term of sort
S
, then this will return that term. Otherwise, an error occurs and rewriting
fails, returning the projection function which failed to apply. The sort is
not actually checked at compilation time; rather, it is a runtime check
inserted into the code that runs when the rule applies.
For example, here is a module that makes use of projection casts:
module LESSON-11-E imports INT imports BOOL syntax Exp ::= Int | Bool | Exp "+" Exp | Exp "&&" Exp syntax Exp ::= eval(Exp) [function] rule eval(I:Int) => I rule eval(B:Bool) => B rule eval(E1 + E2) => {eval(E1)}:>Int +Int {eval(E2)}:>Int rule eval(E1 && E2) => {eval(E1)}:>Bool andBool {eval(E2)}:>Bool endmodule
Here we have defined constructors for a simple expression language over
Booleans and integers, as well as a function eval
that evaluates these
expressions to a value. Because that value could be an integer or a Boolean,
we need the casts in the last two rules in order to meet the type signature of
+Int
and andBool
. Of course, the user can write ill-formed expressions like
1 && true
or false + true
, but these will cause errors at runtime, because
the projection cast will fail.
Extend the eval
function in LESSON-11-E
to include Strings and add a .
operator which concatenates them.
Modify your solution from Lesson 1.9, Exercise 2 by using an Exp
sort to
express the integer and Boolean expressions that it supports, in the same style
as LESSON-11-E
. Then write an eval
function that evaluates all terms of
sort Exp
to either a Bool
or an Int
.
Once you have completed the above exercises, you can continue to
Lesson 1.12: Syntactic Lists.
The purpose of this lesson is to explain how K provides support for syntactic
repetition through the use of the List{}
and NeList{}
constructs,
generally called syntactic lists.
List{}
constructSometimes, when defining a grammar in K, it is useful to define a syntactic
construct consisting of an arbitrary-length sequence of items. For example,
you might wish to define a function call construct, and need to express a way
of passing arguments to the function. You can in theory simply define these
productions using ordinary constructors, but it can be tricky to get the syntax
exactly right in K without a lot of tedious glue code.
For this reason, K provides a way of specifying that a non-terminal represents
a syntactic list (lesson-12-a.k
):
module LESSON-12-A-SYNTAX imports INT-SYNTAX syntax Ints ::= List{Int,","} endmodule module LESSON-12-A imports LESSON-12-A-SYNTAX endmodule
Note that instead of a sequence of terminals and non-terminals, the right hand
side of the Ints
production contains the symbol List
followed by two items
in curly braces. The first item is the non-terminal which is the element type
of the list, and the second item is a terminal representing the separator of
the list. As a special case, lists which are separated only by whitespace can
be specified with a separator of ""
.
This List{}
construct is roughly equivalent to the following definition
(lesson-12-b.k
):
module LESSON-12-B-SYNTAX imports INT-SYNTAX syntax Ints ::= Int "," Ints | ".Ints" endmodule module LESSON-12-B imports LESSON-12-B-SYNTAX endmodule
As you can see, the List{}
construct represents a cons-list with an element
at the head and another list at the tail. The empty list is represented by
a .
followed by the sort of the list.
However, the List{}
construct provides several key syntactic conveniences
over the above definition. First of all, when writing a list in a rule,
explicitly writing the terminator is not always required. For example, consider
the following additional module (lesson-12-c.k
):
module LESSON-12-C imports LESSON-12-A imports INT syntax Int ::= sum(Ints) [function] rule sum(I:Int) => I rule sum(I1:Int, I2:Int, Is:Ints) => sum(I1 +Int I2, Is) endmodule
Here we see a function that sums together a non-empty list of integers. Note in
particular the first rule. We do not explicitly mention .Ints
, but in fact,
the rule in question is equivalent to the following rule:
rule sum(I:Int, .Ints) => I
The reason for this is that K will automatically insert a list terminator
anywhere a syntactic list is expected, but an element of that list appears
instead. This works even with lists of more than one element:
rule sum(I1:Int, I2:Int) => I1 +Int I2
This rule is redundant, but here we explicitly match a list of exactly two
elements, because the .Ints
is implicitly added after I2
.
An additional syntactic convenience takes place when you want to express a
syntactic list in the input to krun
. In this case, K will automatically
transform the grammar in LESSON-12-B-SYNTAX
into the following
(lesson-12-d.k
):
module LESSON-12-D imports INT-SYNTAX syntax Ints ::= #NonEmptyInts | #IntsTerminator syntax #NonEmptyInts ::= Int "," #NonEmptyInts | Int #IntsTerminator syntax #IntsTerminator ::= "" endmodule
This allows you to express the usual comma-separated list of arguments where
an empty list is represented by the empty string, and you don't have to
explicitly terminate the list. Because of this, we can write the syntax
of function calls in C very easily (lesson-12-e.k
):
module LESSON-12-E syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token] syntax Exp ::= Id | Exp "(" Exps ")" syntax Exps ::= List{Exp,","} endmodule
Write a function concat
which takes a list of String
and concatenates them
all together. Do not worry if the function is O(n^2).
Test your implementation using the syntactic sugar for lists added by the parser.
Then write some function call expressions using identifiers in C and verify with
kast
that the above grammar captures the intended syntax. Make sure to test
with function calls with zero, one, and two or more arguments.
NeList{}
constructOne limitation of the List{}
construct is that it is always possible to
write a list of zero elements where a List{}
is expected. While this is
desirable in a number of cases, it is sometimes not what the grammar expects.
For example, in C, it is not allowable for an enum
definition to have zero
members. In other words, if we were to write the grammar for enumerations like
so (lesson-12-f.k
):
module LESSON-12-F syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token] syntax Exp ::= Id syntax EnumSpecifier ::= "enum" Id "{" Ids "}" syntax Ids ::= List{Id,","} endmodule
Then we would be syntactically allowed to write enum X {}
, which instead,
ought to be a syntax error.
For this reason, we introduce the additional NeList{}
construct. The syntax
is identical to List{}
, except with NeList
instead of List
before the
curly braces. When parsing rules, it behaves identically to the List{}
construct. However, when parsing inputs to krun
, the above grammar, if we
replaced syntax Ids ::= List{Id,","}
with syntax Ids ::= NeList{Id,","}
,
would become equivalent to the following (lesson-12-g.k
):
module LESSON-12-G syntax Id ::= r"[a-zA-Z_][a-zA-Z0-9_]*" [token] syntax Exp ::= Id syntax EnumSpecifier ::= "enum" Id "{" Ids "}" syntax Ids ::= Id | Id "," Ids endmodule
In other words, only non-empty lists of Id
would be allowed.
Modify the sum
function in LESSON-12-C
so that the Ints
sort is an
NeList{}
. Verify that calling sum()
with no arguments is now a syntax
error.
Write a modified sum
function with the List
construct that can also sum
up an empty list of arguments. In such a case, the sum ought to be 0.
Once you have completed the above exercises, you can continue to
Lesson 1.13: Basics of K Rewriting.
The purpose of this lesson is to explain how rewrite rules that are not the
definition of a function behave, and how, using these rules, you can construct
a semantics of programs in a programming language in K.
Recall from Lesson 1.2 that we have, thus far,
introduced two types of productions in K: constructors and functions.
A function is identified by the function attribute placed on the
production. As you may recall, when we write a rule with a function on the
left-hand side of the =>
operator, we are defining the meaning of that
function for inputs which match the patterns on the left-hand side of the rule.
If the argument to the function match the patterns, then the function is
evaluated to the value constructed by substituting the bindings for the
variables into the right-hand side of the rule.
However, function rules are not the only type of rule permissible in K, nor
even the most frequently used. K also has a concept of a
top-level rewrite rule. The simplest way to ensure that a rule is treated
as a top-level rule is for the left-hand side of the rule to mention one or
more cells. We will cover how cells work and are declared in more detail
in a later lesson, but for now, what you should know is that when we ran krun
in our very first example in Lesson 1.2 and got the following output:
<k>
Yellow ( ) ~> .
</k>
<k>
is a cell, known by convention as the K cell. This cell is available
by default in any definition without needing to be explicitly declared.
The K cell contains a single term of sort K
. K
is a predefined sort in K
with two constructors, that can be roughly represented by the following
grammar:
syntax K ::= KItem "~>" K
| "."
As a syntactic convenience, K allows you to treat ~>
like it is an
associative list (i.e., as if it were defined as syntax K ::= K "~>" K
).
When a definition is compiled, it will automatically transform the rules you
write so that they treat the K
sort as a cons-list. Another syntactic
convenience is that, for disambiguation purposes, you can write .K
anywhere
you would otherwise write .
and the meaning is identical.
Now, you may notice that the above grammar mentions the sort KItem
. This is
another built-in sort in K. For every sort S
declared in a definition (with
the exception of K
and KItem
), K will implicitly insert the following
production:
syntax KItem ::= S
In other words, every sort is a subsort of the sort KItem
, and thus a term
of any sort can be injected as an element of a term of sort K
, also called
a K sequence.
By default, when you krun
a program, the AST of the program is inserted as
the sole element of a K sequence into the <k>
cell. This explains why we
saw the output we did in Lesson 1.2.
With these preliminaries in mind, we can now explain how top-level rewrite
rules work in K. Put simply, any rule where there is a cell (such as the K
cell) at the top on the left-hand side will be a top-level rewrite rule. Once
the initial program has been inserted into the K cell, the resulting term,
called the configuration, will be matched against all the top-level
rewrite rules in the definition. If only one rule matches, the substitution
generated by the matching will be applied to the right-hand side of the rule
and the resulting term is rewritten to be the new configuration. Rewriting
proceeds by iteratively applying rules, also called taking steps, until
no top-level rewrite rule can be applied. At this point the configuration
becomes the final configuration and is output by krun
.
If more than one top-level rule applies, by default, K
will pick just one
of those rules, apply it, and continue rewriting. However, it is
non-deterministic which rule applies. In theory, it could be any of them.
By passing the --search
flag to krun
, you are able to tell krun
to
explore all possible non-deterministic choices, and generate a complete list of
all possible final configurations reachable by each nondeterminstic choice that
can be made. Note that the --search
flag to krun only works if you pass
--enable-search
to kompile first.
Unlike top-level rewrite rules, function rules are not associated with any
particular set of cells in the configuration (although they can contain cells
in their function arguments and return value). While top-level rewrite rules
apply to the entire term being rewritten, function rules apply anywhere a
function application for that function appears, and are immediately rewritten
to their return value in that position.
Another key distinction between top-level rules and function rules is that
function symbols, i.e., productions with the function
attribute, are
mathematical functions rather than constructors. While a constructor is
logically distinct from any other constructor of the same sort, and can be
matched against unconditionally, a function does not necessaraily have the
same restriction unless it happens to be an injective function. Thus, two
function symbols with different arguments may still ultimately produce the
same value and thus compare equal to one another. Due to this, concrete
execution (i.e., all K definitions introduced thus far; see Lesson 1.21)
introduces the restriction that you cannot match on a function symbol on the
left-hand side of a rule, except as the top symbol on the left-hand side of
a function rule. This restriction will be later lifted when we introduce the
Haskell Backend which performs symbolic execution.
Pass a program containing no functions to krun
. You can use a term of sort
Exp
from LESSON-11-E
. Observe the output and try to understand why you get
the output you do. Then write two rules that rewrite that program to another.
Run krun --search
on that program and observe both results. Then add a third
rule that rewrites one of those results again. Test that that rule applies as
well.
Thus far, we have focused primarily on defining functions over constructors
in K. However, now that we have a basic understanding of top-level rules,
it is possible to introduce a rewrite system to our definitions. A rewrite
system is a collection of top-level rewrite rules which performs an organized
transformation of a particular program into a result which expresses the
meaning of that program. For example, we might rewrite an expression in a
programming language into a value representing the result of evaluating that
expression.
Recall in Lesson 1.11, we wrote a simple grammar of Boolean and integer
expressions that looked roughly like this (lesson-13-a.k
):
module LESSON-13-A imports INT syntax Exp ::= Int | Bool | Exp "+" Exp | Exp "&&" Exp endmodule
In that lesson, we defined a function eval
which evaluated such expressions
to either an integer or Boolean.
However, it is more idiomatic to evaluate such expressions using top-level
rewrite rules. Here is how one might do so in K (lesson-13-b.k
):
module LESSON-13-B-SYNTAX imports UNSIGNED-INT-SYNTAX imports BOOL-SYNTAX syntax Val ::= Int | Bool syntax Exp ::= Val > left: Exp "+" Exp > left: Exp "&&" Exp endmodule module LESSON-13-B imports LESSON-13-B-SYNTAX imports INT imports BOOL rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k> rule <k> B1:Bool && B2:Bool ~> K:K </k> => <k> B1 andBool B2 ~> K </k> syntax KItem ::= freezer1(Val) | freezer2(Exp) | freezer3(Val) | freezer4(Exp) rule <k> E1:Val + E2:Exp ~> K:K </k> => <k> E2 ~> freezer1(E1) ~> K </k> [priority(51)] rule <k> E1:Exp + E2:Exp ~> K:K </k> => <k> E1 ~> freezer2(E2) ~> K </k> [priority(52)] rule <k> E1:Val && E2:Exp ~> K:K </k> => <k> E2 ~> freezer3(E1) ~> K </k> [priority(51)] rule <k> E1:Exp && E2:Exp ~> K:K </k> => <k> E1 ~> freezer4(E2) ~> K </k> [priority(52)] rule <k> E2:Val ~> freezer1(E1) ~> K:K </k> => <k> E1 + E2 ~> K </k> rule <k> E1:Val ~> freezer2(E2) ~> K:K </k> => <k> E1 + E2 ~> K </k> rule <k> E2:Val ~> freezer3(E1) ~> K:K </k> => <k> E1 && E2 ~> K </k> rule <k> E1:Val ~> freezer4(E2) ~> K:K </k> => <k> E1 && E2 ~> K </k> endmodule
This is of course rather cumbersome currently, but we will soon introduce
syntactic convenience which makes writing definitions of this type considerably
easier. For now, notice that there are roughly 3 types of rules here: the first
matches a K cell in which the first element of the K sequence is an Exp
whose
arguments are values, and rewrites the first element of the sequence to the
result of that expression. The second also matches a K cell with an Exp
in
the first element of its K sequence, but it matches when one or both arguments
of the Exp
are not values, and replaces the first element of the K sequence
with two new elements: one being an argument to evaluate, and the other being
a special constructor called a freezer. Finally, the third matches a K
sequence where a Val
is first, and a freezer is second, and replaces them
with a partially evaluated expression.
This general pattern is what is known as heating an expression,
evaluating its arguments, cooling the arguments into the expression
again, and evaluating the expression itself. By repeatedly performing
this sequence of actions, we can evaluate an entire AST containing a complex
expression down into its resulting value.
Write an addition expression with integers. Use krun --depth 1
to see the
result of rewriting after applying a single top-level rule. Gradually increase
the value of --depth
to see successive states. Observe how this combination
of rules is eventually able to evaluate the entire expression.
As you saw above, the definition we wrote is rather cumbersome. Over the
remainder of Lessons 1.13 and 1.14, we will greatly simplify it. The first step
in doing so is to teach a bit more about the rewrite operator, =>
. Thus far,
all the rules we have written look like rule LHS => RHS
. However, this is not
the only way the rewrite operator can be used. It is actually possible to place
a constructor or function at the very top of the rule, and place rewrite
operators inside that term. While a rewrite operator cannot appear nested
inside another rewrite operator, by doing this, we can express that some parts
of what we are matching are not changed by the rewrite operator. For
example, consider the following rule from above:
rule <k> I1:Int + I2:Int ~> K:K </k> => <k> I1 +Int I2 ~> K </k>
We can equivalently write it like following:
rule <k> (I1:Int + I2:Int => I1 +Int I2) ~> _:K </k>
When you put a rewrite inside a term like this, in essence, you are telling
the rule to only rewrite part of the left-hand side to the right-hand side.
In practice, this is implemented by lifting the rewrite operator to the top of
the rule by means of duplicating the surrounding context.
There is a way that the above rule can be simplified further, however. K
provides a special syntax for each cell containing a term of sort K, indicating
that we want to match only on some prefix of the K sequence. For example, the
above rule can be simplified further like so:
rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
Here we have placed the symbol ...
immediately prior to the </k>
which ends
the cell. What this tells the compiler is to take the contents of the cell,
treat it as the prefix of a K sequence, and insert an anonymous variable of
sort K
at the end. Thus we can think of ...
as a way of saying we
don't care about the part of the K sequence after the beginning, leaving
it unchanged.
Putting all this together, we can rewrite LESSON-13-B
like so
(lesson-13-c.k
):
module LESSON-13-C-SYNTAX imports UNSIGNED-INT-SYNTAX imports BOOL-SYNTAX syntax Val ::= Int | Bool syntax Exp ::= Val > left: Exp "+" Exp > left: Exp "&&" Exp endmodule module LESSON-13-C imports LESSON-13-C-SYNTAX imports INT imports BOOL rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k> rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k> syntax KItem ::= freezer1(Val) | freezer2(Exp) | freezer3(Val) | freezer4(Exp) rule <k> E1:Val + E2:Exp => E2 ~> freezer1(E1) ...</k> [priority(51)] rule <k> E1:Exp + E2:Exp => E1 ~> freezer2(E2) ...</k> [priority(52)] rule <k> E1:Val && E2:Exp => E2 ~> freezer3(E1) ...</k> [priority(51)] rule <k> E1:Exp && E2:Exp => E1 ~> freezer4(E2) ...</k> [priority(52)] rule <k> E2:Val ~> freezer1(E1) => E1 + E2 ...</k> rule <k> E1:Val ~> freezer2(E2) => E1 + E2 ...</k> rule <k> E2:Val ~> freezer3(E1) => E1 && E2 ...</k> rule <k> E1:Val ~> freezer4(E2) => E1 && E2 ...</k> endmodule
This is still rather cumbersome, but it is already greatly simplified. In the
next lesson, we will see how additional features of K can be used to specify
heating and cooling rules much more compactly.
LESSON-13-C
to add rules to evaluate integer subtraction.Once you have completed the above exercises, you can continue to
Lesson 1.14: Defining Evaluation Order.
The purpose of this lesson is to explain how to use the heat
and cool
attributes, context
and context alias
sentences, and the strict
and
seqstrict
attributes to more compactly express heating and cooling in K,
and to express more advanced evaluation strategies in K.
heat
and cool
attributesThus far, we have been using rule priority and casts to express when to heat
an expression and when to cool it. For example, the rules for heating have
lower priority, so they do not apply if the term could be evaluated instead,
and the rules for heating are expressly written only to apply if the argument
of the expression is a value.
However, K has built-in support for deciding when to heat and when to cool.
This support comes in the form of the rule attributes heat
and cool
as
well as the specially named function isKResult
.
Consider the following definition, which is equivalent to LESSON-13-C
(lesson-14-a.k
):
module LESSON-14-A-SYNTAX imports UNSIGNED-INT-SYNTAX imports BOOL-SYNTAX syntax Exp ::= Int | Bool > left: Exp "+" Exp > left: Exp "&&" Exp endmodule module LESSON-14-A imports LESSON-14-A-SYNTAX imports INT imports BOOL rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k> rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k> syntax KItem ::= freezer1(Exp) | freezer2(Exp) | freezer3(Exp) | freezer4(Exp) rule <k> E:Exp + HOLE:Exp => HOLE ~> freezer1(E) ...</k> requires isKResult(E) [heat] rule <k> HOLE:Exp + E:Exp => HOLE ~> freezer2(E) ...</k> [heat] rule <k> E:Exp && HOLE:Exp => HOLE ~> freezer3(E) ...</k> requires isKResult(E) [heat] rule <k> HOLE:Exp && E:Exp => HOLE ~> freezer4(E) ...</k> [heat] rule <k> HOLE:Exp ~> freezer1(E) => E + HOLE ...</k> [cool] rule <k> HOLE:Exp ~> freezer2(E) => HOLE + E ...</k> [cool] rule <k> HOLE:Exp ~> freezer3(E) => E && HOLE ...</k> [cool] rule <k> HOLE:Exp ~> freezer4(E) => HOLE && E ...</k> [cool] syntax Bool ::= isKResult(K) [function, symbol] rule isKResult(_:Int) => true rule isKResult(_:Bool) => true rule isKResult(_) => false [owise] endmodule
We have introduced three major changes to this definition. First, we have
removed the Val
sort. We replace it instead with a function isKResult
.
The function in question must have the same signature and attributes as seen in
this example. It ought to return true
whenever a term should not be heated
(because it is a value) and false
when it should be heated (because it is not
a value). We thus also insert isKResult
calls in the side condition of two
of the heating rules, where the Val
sort was previously used.
Second, we have removed the rule priorities on the heating rules and the use of
the Val
sort on the cooling rules, and replaced them with the heat
and
cool
attributes. These attributes instruct the compiler that these rules are
heating and cooling rules, and thus should implicitly apply only when certain
terms on the LHS either are or are not a KResult
(i.e., isKResult
returns
true
versus false
).
Third, we have renamed some of the variables in the heating and cooling rules
to the special variable HOLE
. Syntactically, HOLE
is just a special name
for a variable, but it is treated specially by the compiler. By naming a
variable HOLE
, we have informed the compiler which term is being heated
or cooled. The compiler will automatically insert the side condition
requires isKResult(HOLE)
to cooling rules and the side condition
requires notBool isKResult(HOLE)
to heating rules.
Modify LESSON-14-A
to add rules to evaluate integer subtraction.
The above example is still rather cumbersome to write. We must explicitly write
both the heating and the cooling rule separately, even though they are
essentially inverses of one another. It would be nice to instead simply
indicate which terms should be heated and cooled, and what part of them to
operate on.
To do this, K introduces a new type of sentence, the context. Contexts
begin with the context
keyword instead of the rule
keyword, and usually
do not contain a rewrite operator.
Consider the following definition which is equivalent to LESSON-14-A
(lesson-14-b.k
):
module LESSON-14-B-SYNTAX imports UNSIGNED-INT-SYNTAX imports BOOL-SYNTAX syntax Exp ::= Int | Bool > left: Exp "+" Exp > left: Exp "&&" Exp endmodule module LESSON-14-B imports LESSON-14-B-SYNTAX imports INT imports BOOL rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k> rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k> context <k> E:Exp + HOLE:Exp ...</k> requires isKResult(E) context <k> HOLE:Exp + _:Exp ...</k> context <k> E:Exp && HOLE:Exp ...</k> requires isKResult(E) context <k> HOLE:Exp && _:Exp ...</k> syntax Bool ::= isKResult(K) [function, symbol] rule isKResult(_:Int) => true rule isKResult(_:Bool) => true rule isKResult(_) => false [owise] endmodule
In this example, the heat
and cool
rules have been removed entirely, as
have been the productions defining the freezers. Don't worry, they still exist
under the hood; the compiler is just generating them automatically. For each
context sentence like above, the compiler generates a #freezer
production,
a heat
rule, and a cool
rule. The generated form is equivalent to the
rules we wrote manually in LESSON-14-A
. However, we are now starting to
considerably simplify the definition. Instead of 3 sentences, we just have one.
context alias
sentences and the strict
and seqstrict
attributesNotice that the contexts we included in LESSON-14-B
still seem rather
similar in form. For each expression we want to evaluate, we are declaring
one context for each operand of that expression, and they are each rather
similar to one another. We would like to be able to simplify further by
simply annotating each expression production with information about how
it is to be evaluated instead. We can do this with the seqstrict
attribute.
Consider the following definition, once again equivalent to those above
(lesson-14-c.k
):
module LESSON-14-C-SYNTAX
imports UNSIGNED-INT-SYNTAX
imports BOOL-SYNTAX
syntax Exp ::= Int
| Bool
> left: Exp "+" Exp [seqstrict(exp; 1, 2)]
> left: Exp "&&" Exp [seqstrict(exp; 1, 2)]
endmodule
module LESSON-14-C
imports LESSON-14-C-SYNTAX
imports INT
imports BOOL
rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k>
rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k>
context alias [exp]: <k> HERE ...</k>
syntax Bool ::= isKResult(K) [function, symbol]
rule isKResult(_:Int) => true
rule isKResult(_:Bool) => true
rule isKResult(_) => false [owise]
endmodule
This definition has two important changes from the one above. The first is
that the individual context
sentences have been removed and have been
replaced with a single context alias
sentence. You may notice that this
sentence begins with an identifier in square brackets followed by a colon. This
syntax is a way of naming individual sentences in K for reference by the tool
or by other sentences. The context alias sentence also has a special variable
HERE
.
The second is that the productions in LESSON-14-C-SYNTAX
have been given a
seqstrict
attribute. The value of this attribute has two parts. The first
is the name of a context alias sentence. The second is a comma-separated list
of integers. Each integer represents an index of a non-terminal in the
production, counting from 1. For each integer present, the compiler implicitly
generates a new context
sentence according to the following rules:
context alias
sentence named. Ifcontext
sentence is created percontext alias
sentence with that name.context
created, the variable HERE
in the context alias isseqstrict
attribute isseqstrict
attribute is given the nameHOLE
.isKResult(E)
is conjuncted together and includedE
is the child of the production term with that1, 2
, then2
will include isKResult(E1)
where E1
is theAs you can see if you work through the process, the above code will ultimately
generate the same contexts present in LESSON-14-B
.
Finally, note that there are a few minor syntactic conveniences provided by the
seqstrict
attribute. First, in the special case of the context alias
sentence
being <k> HERE ...</k>
, you can omit both the context alias
sentence
and the name from the seqstrict
attribute.
Second, if the numbered list of offsets contains every non-terminal in the
production, it can be omitted from the attribute value.
Thus, we can finally produce the idiomatic K definition for this example
(lesson-14-d.k
):
module LESSON-14-D-SYNTAX imports UNSIGNED-INT-SYNTAX imports BOOL-SYNTAX syntax Exp ::= Int | Bool > left: Exp "+" Exp [seqstrict] > left: Exp "&&" Exp [seqstrict] endmodule module LESSON-14-D imports LESSON-14-D-SYNTAX imports INT imports BOOL rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k> rule <k> B1:Bool && B2:Bool => B1 andBool B2 ...</k> syntax Bool ::= isKResult(K) [function, symbol] rule isKResult(_:Int) => true rule isKResult(_:Bool) => true rule isKResult(_) => false [owise] endmodule
Modify LESSON-14-D
to add a production and rule to evaluate integer
subtraction.
strict
attributeThus far, we have focused entirely on deterministic evaluation order. However,
not all languages are deterministic in the order they evaluate expressions.
For example, in C, the expression a() + b() + c()
is guaranteed to parse
to (a() + b()) + c()
, but it is not guaranteed that a
will be called before
b
before c
. In fact, this evaluation order is non-deterministic.
We can express non-deterministic evaluation orders with the strict
attribute.
Its behavior is identical to the seqstrict
attribute, except that step 3 in
the above list (with the side condition automatically added) does not take
place. In other words, if we wrote syntax Exp ::= Exp "+" Exp [strict]
instead of syntax Exp ::= Exp "+" Exp [seqstrict]
, it would generate the
following two contexts instead of the ones found in LESSON-14-B
:
context <k> _:Exp + HOLE:Exp ...</k>
context <k> HOLE:Exp + _:Exp ...</k>
As you can see, these contexts will generate heating rules that can both
apply to the same term. As a result, the choice of which heating rule
applies first is non-deterministic, and as we saw in Lesson 1.13, we can
get all possible behaviors by passing --search
to krun.
Add integer division to LESSON-14-D
. Make division and addition strict
instead of seqstrict
, and write a rule evaluating integer division with a
side condition that the denominator is non-zero. Run krun --search
on the
program 1 / 0 + 2 / 1
and observe all possible outputs of the program. How
many are there total, and why?
Rework your solution from Lesson 1.9, Exercise 2 to evaluate expressions from left to right using the seqstrict
attribute.
Once you have completed the above exercises, you can continue to
Lesson 1.15: Configuration Declarations and Cell Nesting.
The purpose of this lesson is to explain how to store additional information
about the state of your interpreter by declaring cells using the
configuration
sentence, as well as how to add additional inputs to your
definition.
We have already covered the absolute basics of cells in K by looking at the
<k>
cell. As explained in Lesson 1.13, the
<k>
cell is available without being explicitly declared. It turns out this is
because, if the user does not explicitly specify a configuration
sentence
anywhere in the main module of their definition, the configuration
sentence
from the DEFAULT-CONFIGURATION
module of
kast.md is imported
automatically. Here is what that sentence looks like:
configuration <k> $PGM:K </k>
This configuration declaration declares a single cell, the <k>
cell. It also
declares that at the start of rewriting, the contents of that cell should be
initialized with the value of the $PGM
configuration variable.
Configuration variables function as inputs to krun
. These terms are supplied
to krun
in the form of ASTs parsed using a particular module. By default, the
$PGM
configuration variable uses the main syntax module of the definition.
The cast on the configuration variable also specifies the sort that is used as
the entry point to the parser, in this case the K
sort. It is often
useful to cast to other sorts there as well for better control over the accepted
language. The sort used for the $PGM
variable is referred to as the start
symbol. During parsing, the default start symbol K
subsumes all user-defined
sorts except for syntactic lists. These are excluded because they will always
produce an ambiguity error when parsing a single element.
Note that we did not explicitly specify the $PGM
configuration variable when
we invoked krun
on a file. This is because krun
handles the $PGM
variable
specially, and allows you to pass the term for that variable via a file passed
as a positional argument to krun
. We did, however, specify the PGM
name
explicitly when we called krun
with the -cPGM
command line argument in
Lesson 1.2. This is the other, explicit, way of
specifying an input to krun.
This explains the most basic use of configuration declarations in K. We can,
however, declare multiple cells and multiple configuration variables. We can
also specify the initial values of cells statically, rather than dynamically
via krun.
For example, consider the following definition (lesson-15-a.k
):
module LESSON-15-A-SYNTAX imports INT-SYNTAX syntax Ints ::= List{Int,","} endmodule module LESSON-15-A imports LESSON-15-A-SYNTAX imports INT configuration <k> $PGM:Ints </k> <sum> 0 </sum> rule <k> I:Int, Is:Ints => Is ...</k> <sum> SUM:Int => SUM +Int I </sum> endmodule
This simple definition takes a list of integers as input and sums them
together. Here we have declared two cells: <k>
and <sum>
. Unlike <k>
,
<sum>
does not get initialized via a configuration variable, but instead
is initialized statically with the value 0
.
Note the rule in the second module: we have explicitly specified multiple
cells in a single rule. K will expect each of these cells to match in order for
the rule to apply.
Here is a second example (lesson-15-b.k
):
module LESSON-15-B-SYNTAX imports INT-SYNTAX endmodule module LESSON-15-B imports LESSON-15-B-SYNTAX imports INT imports BOOL configuration <k> . </k> <first> $FIRST:Int </first> <second> $SECOND:Int </second> rule <k> . => FIRST >Int SECOND </k> <first> FIRST </first> <second> SECOND </second> endmodule
This definition takes two integers as command-line arguments and populates the
<k>
cell with a Boolean indicating whether the first integer is greater than
the second. Notice that we have specified no $PGM
configuration variable
here. As a result, we cannot invoke krun
via the syntax krun $file
.
Instead, we must explicitly pass values for each configuration variable via the
-cFIRST
and -cSECOND
command line flags. For example, if we invoke
krun -cFIRST=0 -cSECOND=1
, we will get the value false
in the K cell.
You can also specify both a $PGM
configuration variable and other
configuration variables in a single configuration declaration, in which case
you would be able to initialize $PGM
with either a positional argument or the
-cPGM
command line flag, but the other configuration variables would need
to be explicitly initialized with -c
.
Modify your solution to Lesson 1.14, Exercise 2 to add a new cell with a
configuration variable of sort Bool
. This variable should determine whether
the /
operator is evaluated using /Int
or divInt
. Test that by specifying
different values for this variable, you can change the behavior of rounding on
division of negative numbers.
It is possible to nest cells inside one another. A cell that contains other
cells must contain only other cells, but in doing this, you are able to
create a hierarchical structure to the configuration. Consider the following
definition (lesson-15-c.k
), which is equivalent to the one in LESSON-15-B
:
module LESSON-15-C-SYNTAX imports INT-SYNTAX endmodule module LESSON-15-C imports LESSON-15-C-SYNTAX imports INT imports BOOL configuration <T> <k> . </k> <state> <first> $FIRST:Int </first> <second> $SECOND:Int </second> </state> </T> rule <k> . => FIRST >Int SECOND </k> <first> FIRST </first> <second> SECOND </second> endmodule
Note that we have added some new cells to the configuration declaration:
the <T>
cell wraps the entire configuration, and the <state>
cell is
introduced around the <first>
and <second>
cells.
However, we have not changed the rule in this definition. This is because of
a concept in K called configuration abstraction. K allows you to specify
any number of cells in a rule (except zero) in any order you want, and K will
compile the rules into a form that matches the structure of the configuration
specified by the configuration declaration.
Here then, is how this rule would look after the configuration abstraction
has been resolved:
rule <T>
<k> . => FIRST >Int SECOND </k>
<state>
<first> FIRST </first>
<second> SECOND </second>
</state>
</T>
In other words, K will complete cells to the top of the configuration by
inserting parent cells where appropriate based on the declared structure of
the configuration. This is useful because as a definition evolves, the
configuration may change, but you don't want to have to modify every single
rule each time. Thus, K follows the principle that you should only mention the
cells in a rule that are actually needed in order to accomplish its specific
goal. By following this best practice, you can significantly increase the
modularity of the definition and make it easier to maintain and modify.
Note that unlike top-level rewrite rules, cells that appear inside function
rules are not necessarily completed to the top of the configuration. They still
participate in cell ccompletion in the sense that you can mention cell
structure loosely inside a function rule and it will be completed into the
correct cell structure specified by the configuration declaration. However,
they do not complete all the way to the top, instead completing only up to
the top-most cell mentioned in the rule.
For example, if I write the following function rule in the above definition:
rule doStuff(<first> FIRST </first>) => FIRST
The function will only match on the first
cell, rather than the entire
configuration. However, if we had mentioned a parent cell in the rule, it still
would have completed the children of that parent cell as needed to ensure that
the resulting term is well formed.
Modify your definition from the previous exercise in this lesson to wrap the
two cells you have declared in a top cell <T>
. You should not have to change
any other rules in the definition.
Sometimes it is desirable to explicitly match a variable against certain
fragments of the configuration. Because K's configuration is hierarchical,
we can grab subsets of the configuration as if they were just another term.
However, configuration abstraction applies here as well.
In particular, for each cell you specify in a configuration declaration, a
unique sort is assigned for that cell with a single constructor (the cell
itself). The sort name is taken by removing all special characters,
capitalizing the first letter and each letter after a hyphen, and adding the
word Cell
at the end. For example, in the above example, the cell sorts are
TCell
, KCell
, StateCell
, FirstCell
, and SecondCell
. If we had declared
a cell as <first-number>
, then the cell sort name would be FirstNumberCell
.
You can explicitly reference a variable of one of these sorts anywhere you
might instead write that cell. For example, consider the following rule:
rule <k> true => S </k>
(S:StateCell => <state>... .Bag ...</state>)
Here we have introduced two new concepts. The first is the variable of sort
StateCell
, which matches the entire <state>
part of the configuration. The
second is that we have introduced the concept of ...
once again. When a cell
contains other cells, it is also possible to specify ...
on either the left,
right or both sides of the cell term. Each of these three syntaxes are
equivalent in this case. When they appear on the left-hand side of a rule, they
indicate that we don't care what value any cells not explicitly named might
have. For example, we might write <state>... <first> 0 </first> ...</state>
on
the left-hand side of a rule in order to indicate that we want to match the
rule when the <first>
cell contains a zero, regardless of what the <second>
cell contains. If we had not included this ellipsis, it would have been a
syntax error, because K would have expected you to provide a value for each of
the child cells.
However, if, as in the example above, the ...
appeared on the right-hand side
of a rule, this instead indicates that the cells not explicitly mentioned under
the cell should be initialized with their default value from the configuration
declaration. In other words, that rule will set the value of <first>
and
<second>
to zero.
You may note the presence of the phrase .Bag
here. You can think of this as
the empty set of cells. It is used as the child of a cell when you want to
indicate that no cells should be explicitly named. We will cover other uses
of this term in later lessons.
syntax Stmt ::= Bool ";" Exp
, and a rule that uses this Stmt
to set thesyntax Stmt ::= "reset" ";" Exp
which sets the value of the Boolean flag back...
on the right-hand side. You will need to addOnce you have completed the above exercises, you can continue to
Lesson 1.16: Maps, Semantic Lists, and Sets.
The purpose of this lesson is to explain how to use the data structure sorts
provided by K: maps, lists, and sets.
The most frequently used type of data structure in K is the map. The sort
provided by K for this purpose is the Map
sort, and it is provided in
domains.md in the MAP
module. This type is not (currently) polymorphic. All Map
terms are maps that
map terms of sort KItem
to other terms of sort KItem
. A KItem
can contain
any sort except a K
sequence. If you need to store such a term in a
map, you can always use a wrapper such as syntax KItem ::= kseq(K)
.
A Map
pattern consists of zero or more map elements (as represented by the
symbol syntax Map ::= KItem "|->" KItem
), mixed in any order, separated by
whitespace, with zero or one variables of sort Map
. The empty map is
represented by .Map
. If all of the bindings for the variables in the keys
of the map can be deterministically chosen, these patterns can be matched in
O(1)
time. If they cannot, then each map element that cannot be
deterministically constructed contributes a single dimension of polynomial
time to the cost of the matching. In other words, a single such element is
linear, two are quadratic, three are cubic, etc.
Patterns like the above are the only type of Map
pattern that can appear
on the left-hand-side of a rule. In other words, you are not allowed to write
a Map
pattern on the left-hand-side with more than one variable of sort Map
in it. You are, however, allowed to write such patterns on the right-hand-side
of a rule. You can also write a function pattern in the key of a map element
so long as all the variables in the function pattern can be deterministically
chosen.
Note the meaning of matching on a Map
pattern: a map pattern with no
variables of sort Map
will match if the map being matched has exactly as
many bindings as |->
symbols in the pattern. It will then match if each
binding in the map pattern matches exactly one distinct binding in the map
being matched. A map pattern with one Map
variable will also match any map
that contains such a map as a subset. The variable of sort Map
will be bound
to whatever bindings are left over (.Map
if there are no bindings left over).
Here is an example of a simple definition that implements a very basic
variable declaration semantics using a Map
to store the value of variables
(lesson-16-a.k
):
module LESSON-16-A-SYNTAX imports INT-SYNTAX imports ID-SYNTAX syntax Exp ::= Id | Int syntax Decl ::= "int" Id "=" Exp ";" [strict(2)] syntax Pgm ::= List{Decl,""} endmodule module LESSON-16-A imports LESSON-16-A-SYNTAX imports BOOL configuration <T> <k> $PGM:Pgm </k> <state> .Map </state> </T> // declaration sequence rule <k> D:Decl P:Pgm => D ~> P ...</k> rule <k> .Pgm => . ...</k> // variable declaration rule <k> int X:Id = I:Int ; => . ...</k> <state> STATE => STATE [ X <- I ] </state> // variable lookup rule <k> X:Id => I ...</k> <state>... X |-> I ...</state> syntax Bool ::= isKResult(K) [symbol, function] rule isKResult(_:Int) => true rule isKResult(_) => false [owise] endmodule
There are several new features in this definition. First, note we import
the module ID-SYNTAX
. This module is defined in domains.md
and provides a
basic syntax for identifiers. We are using the Id
sort provided by this
module in this definition to implement the names of program variables. This
syntax is only imported when parsing programs, not when parsing rules. Later in
this lesson we will see how to reference specific concrete identifiers in a
rule.
Second, we introduce a single new function over the Map
sort. This function,
which is represented by the symbol
syntax Map ::= Map "[" KItem "<-" KItem "]"
, represents the map update
operation. Other functions over the Map
sort can be found in domains.md
.
Finally, we have used the ...
syntax on a cell containing a Map. In this
case, the meaning of <state>... Pattern ...</state>
,
<state>... Pattern </state>
, and <state> Pattern ...</state>
are the same:
it is equivalent to writing <state> (Pattern) _:Map </state>
.
Consider the following program (a.decl
):
int x = 0;
int y = 1;
int a = x;
If we run this program with krun
, we will get the following result:
<T>
<k>
.
</k>
<state>
a |-> 0
x |-> 0
y |-> 1
</state>
</T>
Note that krun
has automatically sorted the collection for you. This doesn't
happen at runtime, so you still get the performance of a hash map, but it will
help make the output more readable.
Create a sort Stmt
that is a subsort of Decl
. Create a production of sort
Stmt
for variable assignment in addition to the variable declaration
production. Feel free to use the syntax syntax Stmt ::= Id "=" Exp ";"
. Write
a rule that implements variable assignment using a map update function. Then
write the same rule using a map pattern. Test your implementations with some
programs to ensure they behave as expected.
In a previous lesson, we explained how to represent lists in the AST of a
program. However, this is not the only context where lists can be used. We also
frequently use lists in the configuration of an interpreter in order to
represent certain types of program state. For this purpose, it is generally
useful to have an associative-list sort, rather than the cons-list sorts
provided in Lesson 1.12.
The type provided by K for this purpose is the List
sort, and it is also
provided in domains.md
, in the LIST
module. This type is also not
(currently) polymorphic. Like Map
, all List
terms are lists of terms of the
KItem
sort.
A List
pattern in K consists of zero or more list elements (as represented by
the ListItem
symbol), followed by zero or one variables of sort List
,
followed by zero or more list elements. An empty list is represented by
.List
. These patterns can be matched in O(log(N))
time. This is the only
type of List
pattern that can appear on the left-hand-side of a rule. In
other words, you are not allowed to write a List
pattern on the
left-hand-side with more than one variable of sort List
in it. You are,
however, allowed to write such patterns on the right-hand-side of a rule.
Note the meaning of matching on a List
pattern: a list pattern with no
variables of sort List
will match if the list being matched has exactly as
many elements as ListItem
symbols in the pattern. It will then match if each
element in sequence matches the pattern contained in the ListItem
symbol. A
list pattern with one variable of sort List
operates the same way, except
that it can match any list with at least as many elements as ListItem
symbols, so long as the prefix and suffix of the list match the patterns inside
the ListItem
symbols. The variable of sort List
will be bound to whatever
elements are left over (.List
if there are no elements left over).
The ...
syntax is allowed on cells containing lists as well. In this case,
the meaning of <cell>... Pattern </cell>
is the same as
<cell> _:List (Pattern) </cell>
, the meaning of <cell> Pattern ...</cell>
is the same as <cell> (Pattern) _:List</cell>
. Because list patterns with
multiple variables of sort List
are not allowed, it is an error to write
<cell>... Pattern ...</cell>
.
Here is an example of a simple definition that implements a very basic
function-call semantics using a List
as a function stack (lesson-16-b.k
):
module LESSON-16-B-SYNTAX imports INT-SYNTAX imports ID-SYNTAX syntax Exp ::= Id "(" ")" | Int syntax Stmt ::= "return" Exp ";" [strict] syntax Decl ::= "fun" Id "(" ")" "{" Stmt "}" syntax Pgm ::= List{Decl,""} syntax Id ::= "main" [token] endmodule module LESSON-16-B imports LESSON-16-B-SYNTAX imports BOOL imports LIST configuration <T> <k> $PGM:Pgm ~> main () </k> <functions> .Map </functions> <fstack> .List </fstack> </T> // declaration sequence rule <k> D:Decl P:Pgm => D ~> P ...</k> rule <k> .Pgm => . ...</k> // function definitions rule <k> fun X:Id () { S } => . ...</k> <functions>... .Map => X |-> S ...</functions> // function call syntax KItem ::= stackFrame(K) rule <k> X:Id () ~> K => S </k> <functions>... X |-> S ...</functions> <fstack> .List => ListItem(stackFrame(K)) ...</fstack> // return statement rule <k> return I:Int ; ~> _ => I ~> K </k> <fstack> ListItem(stackFrame(K)) => .List ...</fstack> syntax Bool ::= isKResult(K) [function, symbol] rule isKResult(_:Int) => true rule isKResult(_) => false [owise] endmodule
Notice that we have declared the production syntax Id ::= "main" [token]
.
Since we use the ID-SYNTAX
module, this declaration is necessary in order to
be able to refer to the main
identifier directly in the configuration
declaration. Our <k>
cell now contains a K
sequence initially: first we
process all the declarations in the program, then we call the main
function.
Consider the following program (foo.func
):
fun foo() { return 5; }
fun main() { return foo(); }
When we krun
this program, we should get the following output:
<T>
<k>
5 ~> .
</k>
<functions>
foo |-> return 5 ;
main |-> return foo ( ) ;
</functions>
<fstack>
.List
</fstack>
</T>
Note that we have successfully put on the <k>
cell the value returned by the
main
function.
Add a term of sort Id
to the stackFrame
operator to keep track of the
name of the function in that stack frame. Then write a function
syntax String ::= printStackTrace(List)
that takes the contents of the
<fstack>
cell and pretty prints the current stack trace. You can concatenate
strings with +String
in the STRING
module in domains.md
, and you can
convert an Id
to a String
with the Id2String
function in the ID
module.
Test this function by creating a new expression that returns the current stack
trace as a string. Make sure to update isKResult
and the Exp
sort as
appropriate to allow strings as values.
The final primary data structure sort in K is a set, i.e., an idempotent
unordered collection where elements are deduplicated. The sort provided by K
for this purpose is the Set
sort and it is provided in domains.md
in the
SET
module. Like maps and lists, this type is not (currently) polymorphic.
Like Map
and List
, all Set
terms are sets of terms of the KItem
sort.
A Set
pattern has the exact same restrictions as a Map
pattern, except that
its elements are treated like keys, and there are no values. It has the same
performance characteristics as well. However, syntactically it is more similar
to the List
sort: An empty Set
is represented by .Set
, but a set element
is represented by the SetItem
symbol.
Matching behaves similarly to the Map
sort: a set pattern with no variables
of sort Set
will match if the set has exactly as many bindings as SetItem
symbols, and if each element pattern matches one distinct element in the set.
A set with a variable of sort Set
also matches any superset of such a set.
As with map, the elements left over will be bound to the Set
variable (or
.Set
if no elements are left over).
Like Map
, the ...
syntax on a set is syntactic sugar for an anonymous
variable of sort Set
.
Here is an example of a simple modification to LESSON-16-A
which uses a Set
to ensure that variables are never declared more than once. In practice, you
would likely just use the in_keys
symbol over maps to test for this, but
it's still useful as an example of sets in practice:
module LESSON-16-C-SYNTAX imports LESSON-16-A-SYNTAX endmodule module LESSON-16-C imports LESSON-16-C-SYNTAX imports BOOL imports SET configuration <T> <k> $PGM:Pgm </k> <state> .Map </state> <declared> .Set </declared> </T> // declaration sequence rule <k> D:Decl P:Pgm => D ~> P ...</k> rule <k> .Pgm => . ...</k> // variable declaration rule <k> int X:Id = I:Int ; => . ...</k> <state> STATE => STATE [ X <- I ] </state> <declared> D => D SetItem(X) </declared> requires notBool X in D // variable lookup rule <k> X:Id => I ...</k> <state>... X |-> I ...</state> <declared>... SetItem(X) ...</declared> syntax Bool ::= isKResult(K) [symbol, function] rule isKResult(_:Int) => true rule isKResult(_) => false [owise] endmodule
Now if we krun
a program containing duplicate declarations, it will get
stuck on the declaration.
Decls
, Decl
, and Stmt
which include variable and function declarationList
and Map
to implement these operators, makingOnce you have completed the above exercises, you can continue to
Lesson 1.17: Cell Multiplicity and Cell Collections.
The purpose of this lesson is to explain how you can create optional cells
and cells that repeat multiple times in a configuration using a feature called
cell multiplicity.
K allows you to specify attributes for cell productions as part of the syntax
of configuration declarations. Unlike regular productions, which use the []
syntax for attributes, configuration cells use an XML-like attribute syntax:
configuration <k color="red"> $PGM:K </k>
This configuration declaration gives the <k>
cell the color red during
unparsing using the color
attribute as discussed in
Lesson 1.9.
However, in addition to the usual attributes for productions, there are some
other attributes that can be applied to cells with special meaning. One such
attribute is the multiplicity
attribute. By default, each cell that is
declared occurs exactly once in every configuration term. However, using the
multiplicity
attribute, this default behavior can be changed. There are two
values that this attribute can have: ?
and *
.
The first cell multiplicity we will discuss is ?
. Similar to a regular
expression language, this attribute tells the compiler that this cell can
appear 0 or 1 times in the configuration. In other words, it is an
optional cell. By default, K does not create optional cells in the initial
configuration, unless that optional cell has a configuration variable inside
it. However, it is possible to override the default behavior and create that
cell initially by adding the additional cell attribute initial=""
.
K uses the .Bag
symbol to represent the absence of any cells in a particular
rule. Consider the following module:
module LESSON-17-A imports INT configuration <k> $PGM:K </k> <optional multiplicity="?"> 0 </optional> syntax KItem ::= "init" | "destroy" rule <k> init => . ...</k> (.Bag => <optional> 0 </optional>) rule <k> destroy => . ...</k> (<optional> _ </optional> => .Bag) endmodule
In this definition, when the init
symbol is executed, the <optional>
cell
is added to the configuration, and when the destroy
symbol is executed, it
is removed. Any rule that matches on that cell will only match if that cell is
present in the configuration.
Create a simple definition with a Stmts
sort that is a List{Stmt,""}
and
a Stmt
sort with the constructors
syntax Stmt ::= "enable" | "increment" | "decrement" | "disable"
. The
configuration should have an optional cell that contains an integer that
is created with the enable
command, destroyed with the disable
command,
and its value is incremented or decremented by the increment
and decrement
command.
The second type of cell multiplicity we will discuss is *
. Simlar to a
regular expression language, this attribute tells the compiler that this cell
can appear 0 or more times in the configuration. In other words, it is a
cell collection. Cells with multiplicity *
must be the only child of
their parent cell. As a convention, the inner cell is usually named with the
singular form of what it contains, and the outer cell with the plural form, for
example, "thread" and "threads".
All cell collections are required to have the type
attribute set to either
Set
or Map
. A Set
cell collection is represented as a set and behaves
internally the same as the Set
sort, although it actually declares a new
sort. A Map
cell collection is represented as a Map
in which the first
subcell of the cell collection is the key and the remaining cells are the
value.
For example, consider the following module:
module LESSON-17-B imports INT imports BOOL imports ID-SYNTAX syntax Stmt ::= Id "=" Exp ";" [strict(2)] | "return" Exp ";" [strict] syntax Stmts ::= List{Stmt,""} syntax Exp ::= Id | Int | Exp "+" Exp [seqstrict] | "spawn" "{" Stmts "}" | "join" Exp ";" [strict] configuration <threads> <thread multiplicity="*" type="Map"> <id> 0 </id> <k> $PGM:K </k> </thread> </threads> <state> .Map </state> <next-id> 1 </next-id> rule <k> X:Id => I:Int ...</k> <state>... X |-> I ...</state> rule <k> X:Id = I:Int ; => . ...</k> <state> STATE => STATE [ X <- I ] </state> rule <k> S:Stmt Ss:Stmts => S ~> Ss ...</k> rule <k> I1:Int + I2:Int => I1 +Int I2 ...</k> rule <thread>... <k> spawn { Ss } => NEXTID ...</k> ...</thread> <next-id> NEXTID => NEXTID +Int 1 </next-id> (.Bag => <thread> <id> NEXTID </id> <k> Ss </k> </thread>) rule <thread>... <k> join ID:Int ; => I ...</k> ...</thread> (<thread> <id> ID </id> <k> return I:Int ; ...</k> </thread> => .Bag) syntax Bool ::= isKResult(K) [function, symbol] rule isKResult(_:Int) => true rule isKResult(_) => false [owise] endmodule
This module implements a very basic fork/join semantics. The spawn
expression
spawns a new thread to execute a sequence of statements and returns a thread
id, and the join
statement waits until a thread executes return
and then
returns the return value of the thread.
Note something quite novel here: the <k>
cell is inside a cell of
multiplicity *
. Since the <k>
cell is just a regular cell (mostly), this
is perfectly allowable. Rules that don't mention a specific thread are
automatically completed to match any thread.
When you execute programs in this language, the cells in the cell collection
get sorted and printed like any other collection, but they still display like
cells. Rules in this language also benefit from all the structural power of
cells, allowing you to omit cells you don't care about or complete the
configuration automatically. This allows you to have the power of cells while
still being a collection under the hood.
Map
is now a cell collection. Run some programsOnce you have completed the above exercises, you can continue to
Lesson 1.18: Term Equality and the Ternary Operator.
The purpose of this lesson is to introduce how to compare equality of terms in
K, and how to put conditional expressions directly into the right-hand side of
rules.
One major way you can compare whether two terms are equal in K is to simply
match both terms with a variable with the same name. This will only succeed
in matching if the two terms are equal structurally. However, sometimes this
is impractical, and it is useful to have access to a way to actually compare
whether two terms in K are equal. The operator for this is found in
domains.md in the K-EQUAL
module. The operator is ==K
and takes two terms of sort K
and returns a
Bool
. It returns true if they are equal. This includes equality over builtin
types such as Map
and Set
where equality is not purely structural in
nature. However, it does not include any notion of semantic equality over
user-defined syntax. The inverse symbol for inequality is =/=K
.
One way to introduce conditional logic in K is to have two separate rules,
each with a side condition (or one rule with a side condition and another with
the owise
attribute). However, sometimes it is useful to explicitly write
a conditional expression directly in the right-hand side of a rule. For this
purpose, K defines one more operator in the K-EQUAL
module, which corresponds
to the usual ternary operator found in many languages. Here is an example of its
usage (lesson-18.k
):
module LESSON-18 imports INT imports BOOL imports K-EQUAL syntax Exp ::= Int | Bool | "if" "(" Exp ")" Exp "else" Exp [strict(1)] syntax Bool ::= isKResult(K) [function, symbol] rule isKResult(_:Int) => true rule isKResult(_:Bool) => true rule if (B:Bool) E1:Exp else E2:Exp => #if B #then E1 #else E2 #fi endmodule
Note the symbol on the right-hand side of the final rule. This symbol is
polymorphic: B
must be of sort Bool
, but E1
and E2
could have been
any sort so long as both were of the same sort, and the sort of the entire
expression becomes equal to that sort. K supports polymorphic built-in
operators, but does not yet allow users to write their own polymorphic
productions.
The behavior of this function is to evaluate the Boolean expression to a
Boolean, then pick one of the two children and return it based on whether the
Boolean is true or false. Please note that it is not a good idea to use this
symbol in cases where one or both of the children is potentially undefined
(for example, an integer expression that divides by zero). While the default
implementation is smart enough to only evaluate the branch that happens to be
picked, this will not be true when we begin to do program verification. If
you need short circuiting behavior, it is better to use a side condition.
Write a function in K that takes two terms of sort K
and returns an
Int
: the Int
should be 0 if the terms are equal and 1 if the terms are
unequal.
Modify your solution to Lesson 1.16, Exercise 1 and introduce an if
Stmt
to the syntax of the language, then implement it using the #if
symbol.
Make sure to write tests for the resulting interpreter.
Once you have completed the above exercises, you can continue to
Lesson 1.19: Debugging with GDB.
The purpose of this lesson is to teach how to debug your K interpreter using
the K-language support provided in GDB or
LLDB.
This lesson has been written with GDB support on Linux in mind. Unfortunately,
on macOS, GDB has limited support. To address this, we have introduced early
experimental support for debugging with LLDB on macOS. In some cases, the
features supported by LLDB are slightly different to those supported by GDB; the
tutorial text will make this clear where necessary. If you use a macOS with an
LLVM version older than 15, you may need to upgrade it to use the LLDB
correctly. If you encounter an issue on either operating system, please open an
issue against the K repository.
On Linux, you will need GDB in order to complete this lesson. If you do not
already have GDB installed, then do so. Steps to install GDB are outlined in
this GDB Tutorial.
On macOS, LLDB should already have been installed with K's build dependencies
(whether you have built K from source, or installed it using kup
or Homebrew).
The first thing neccessary in order to debug a K interpreter is to build the
interpreter with full debugging support enabled. This can be done relatively
simply. First, run kompile
with the command line flag --enable-llvm-debug
.
The resulting compiled K definition will be ready to support debugging.
Once you have a compiled K definition and a program you wish to debug, you can
start the debugger by passing the --debugger
flag to krun
. This will
automatically load the program you are executing into GDB and drop you into a
GDB shell ready to start executing the program.
As an example, consider the following K definition (lesson-19-a.k
):
module LESSON-19-A imports INT rule I => I +Int 1 requires I <Int 100 endmodule
If we compile this definition with kompile lesson-19-a.k --enable-llvm-debug
,
and run the program 0
in the debugger with krun -cPGM=0 --debugger
, we will
see the following output (roughly, and depending on which platform you are
using):
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./lesson-19-a-kompiled/interpreter...
warning: File "/home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter
line to your configuration file "/home/dwightguth/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/home/dwightguth/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
(gdb)
To make full advantage of the GDB features of K, you should follow the first
command listed in this output message and add the corresponding
add-auto-load-safe-path
command to your ~/.gdbinit
file as prompted.
Please note that the path will be different on your machine than the one
listed above. Adding directories to the "load safe path" effectively tells GDB
to trust those directories. All content under a given directory will be recursively
trusted, so if you want to avoid having to add paths to the "load safe path" every
time you kompile a different K
definition, then you can just trust a minimal
directory containing all your kompiled files; however, do not choose a top-level directory containing arbitrary files as this amounts to trusting arbitrary files and is a security risk. More info on the load safe path
can be found here.
(lldb) target create "./lesson-19-a-kompiled/interpreter"
warning: 'interpreter' contains a debug script. To run this script in this debug session:
command script import "/Users/brucecollie/code/scratch/lesson-19-a-kompiled/interpreter.dSYM/Contents/Resources/Python/interpreter.py"
To run all discovered debug scripts in this session:
settings set target.load-script-from-symbol-file true
Current executable set to '/Users/brucecollie/code/scratch/lesson-19-a-kompiled/interpreter' (x86_64).
(lldb) settings set -- target.run-args ".krun-2023-03-20-11-22-46-TcYt9ffhb2/tmp.in.RupiLwHNfn" "-1" ".krun-2023-03-20-11-22-46-TcYt9ffhb2/result.kore"
(lldb)
LLDB applies slightly different security policies to GDB. To load K's debugging
scripts for this session only, you can run the command script import
line at
the LLDB prompt. The loaded scripts will not persist across debugging sessions
if you do this. It is also possible to configure LLDB to automatically load the
K scripts when an interpreter is started in LLDB; doing so requires a slightly
less broad permission than GDB.
On macOS, the .dSYM
directory that contains debugging symbols for an
executable can also contain Python scripts in Contents/Resources/Python
. If
there is a Python script with a name matching the name of the current executable
(here, interpreter
and interpreter.py
), it will be automatically loaded if
the target.load-script-from-symbol-file
setting is set). You can therefore add
the settings set
command to your ~/.lldbinit
without enabling full arbitrary
code execution, but you should be aware of the paths from which code can be
executed if you do so.
LLDB Note: the
k start
andk step
commands are currently not
implemented in the K LLDB scripts. To work around this limitation temporarily,
you can runprocess launch --stop-at-entry
instead ofk start
. To emulate
k step
, first runrbreak k_step
once, thencontinue
instead of eachk step
. We hope to address these limitations soon.
The most basic commands you can execute in the K GDB session are to run your
program or to step through it. The first can be accomplished using GDB's
built-in run
command. This will automatically start the program and begin
executing it. It will continue until the program aborts or finishes, or the
debugger is interrupted with Ctrl-C.
Sometimes you want finer-grained control over how you proceed through the
program you are debugging. To step through the rule applications in your
program, you can use the k start
and k step
GDB commands.
k start
is similar to the built-in start
command in that it starts the
program and then immediately breaks before doing any work. However, unlike
the start
command which will break immediately after the main
method of
a program is executed, the K start
program will initialize the rewriter,
evaluate the initial configuration, and break immediately prior to applying
any rewrite steps.
In the example above, here is what we see when we run the k start
command:
Temporary breakpoint 1 at 0x239210
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-a-kompiled/interpreter .krun-2021-08-13-14-10-50-sMwBkbRicw/tmp.in.01aQt85TaA -1 .krun-2021-08-13-14-10-50-sMwBkbRicw/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, 0x0000000000239210 in main ()
0x0000000000231890 in step (subject=<k>
0 ~> .
</k>)
(gdb)
As you can see, we are stopped at the step
function in the interpreter.
This function is responsible for taking top-level rewrite steps. The subject
parameter to this function is the current K configuration.
We can step through K rewrite steps one at a time by running the k step
command. By default, this takes a single rewrite step (including any function
rule applications that are part of that step).
Here is what we see when we run that command:
Continuing.
Temporary breakpoint -22, 0x0000000000231890 in step (subject=<k>
1 ~> .
</k>)
(gdb)
As we can see, we have taken a single rewrite step. We can also pass a number
to the k step
command which indicates the number of rewrite steps to take.
Here is what we see if we run k step 10
:
Continuing.
Temporary breakpoint -23, 0x0000000000231890 in step (subject=<k>
11 ~> .
</k>)
(gdb)
As we can see, ten rewrite steps were taken.
The next important step in debugging an application in GDB is to be able to
set breakpoints. Generally speaking, there are three types of breakpoints we
are interested in a K semantics: Setting a breakpoint when a particular
function is called, setting a breakpoint when a particular rule is applied,
and setting a breakpoint when a side condition of a rule is evaluated.
The easiest way to do the first two things is to set a breakpoint on the
line of code containing the function or rule.
For example, consider the following K definition (lesson-19-b.k
):
module LESSON-19-B imports BOOL syntax Bool ::= isBlue(Fruit) [function] syntax Fruit ::= Blueberry() | Banana() rule isBlue(Blueberry()) => true rule isBlue(Banana()) => false rule F:Fruit => isBlue(F) endmodule
Once this program has been compiled for debugging, we can run the program
Blueberry()
. We can then set a breakpoint that stops when the isBlue
function is called with the following command in GDB:
break lesson-19-b.k:4
Similarly, in LLDB, run:
breakpoint set --file lesson-19-b.k --line 4
Here is what we see if we set this breakpoint and then run the interpreter:
(gdb) break lesson-19-b.k:4
Breakpoint 1 at 0x231040: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 4.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-20-27-vXOQmV6lwS/tmp.in.fga98yqXlc -1 .krun-2021-08-13-14-20-27-vXOQmV6lwS/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit (_1=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:4
4 syntax Bool ::= isBlue(Fruit) [function]
(gdb)
(lldb) breakpoint set --file lesson-19-b.k --line 4
Breakpoint 1: where = interpreter`LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit + 20 at lesson-19-b.k:4:19, address = 0x0000000100003ff4
(lldb) run
Process 50546 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50546 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003ff4 interpreter`LblisBlue'LParUndsRParUnds'LESSON-19-B'Unds'Bool'Unds'Fruit(_1=Blueberry ( )) at lesson-19-b.k:4:19
1 module LESSON-19-B
2 imports BOOL
3
-> 4 syntax Bool ::= isBlue(Fruit) [function]
5 syntax Fruit ::= Blueberry() | Banana()
6 rule isBlue(Blueberry()) => true
7 rule isBlue(Banana()) => false
(lldb)
As we can see, we have stopped at the point where we are evaluating that
function. The value _1
that is a parameter to that function shows the
value passed to the function by the caller.
We can also break when the isBlue(Blueberry()) => true
rule applies by simply
changing the line number to the line number of that rule:
(gdb) break lesson-19-b.k:6
Breakpoint 1 at 0x2af710: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-32-36-7kD0ic7XwD/tmp.in.8JNH5Qtmow -1 .krun-2021-08-13-14-32-36-7kD0ic7XwD/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, apply_rule_138 () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:6
6 rule isBlue(Blueberry()) => true
(gdb)
(lldb) breakpoint set --file lesson-19-b.k --line 6
Breakpoint 1: where = interpreter`apply_rule_140 at lesson-19-b.k:6:8, address = 0x0000000100004620
(lldb) run
Process 50681 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50681 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100004620 interpreter`apply_rule_140 at lesson-19-b.k:6:8
3
4 syntax Bool ::= isBlue(Fruit) [function]
5 syntax Fruit ::= Blueberry() | Banana()
-> 6 rule isBlue(Blueberry()) => true
7 rule isBlue(Banana()) => false
8
9 rule F:Fruit => isBlue(F)
(lldb)
We can also do the same with a top-level rule:
(gdb) break lesson-19-b.k:9
Breakpoint 1 at 0x2aefa0: lesson-19-b.k:9. (2 locations)
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b-kompiled/interpreter .krun-2021-08-13-14-33-13-9fC8Sz4aO3/tmp.in.jih1vtxSiQ -1 .krun-2021-08-13-14-33-13-9fC8Sz4aO3/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, apply_rule_107 (Var'Unds'DotVar0=<generatedCounter>
0
</generatedCounter>, Var'Unds'DotVar1=., VarF=Blueberry ( )) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-b.k:9
9 rule F:Fruit => isBlue(F)
(gdb)
(lldb) breakpoint set --file lesson-19-b.k --line 9
Breakpoint 1: 2 locations.
(lldb) run
Process 50798 launched: '/Users/brucecollie/code/scratch/lesson-19-b-kompiled/interpreter' (x86_64)
Process 50798 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003f2e interpreter`apply_rule_109(Var'Unds'DotVar0=<generatedCounter>
0
</generatedCounter>, Var'Unds'DotVar1=., VarF=Blueberry ( )) at lesson-19-b.k:9:8
6 rule isBlue(Blueberry()) => true
7 rule isBlue(Banana()) => false
8
-> 9 rule F:Fruit => isBlue(F)
10 endmodule
(lldb)
Unlike the function rule above, we see several parameters to this function.
These are the substitution that was matched for the function. Variables only
appear in this substitution if they are actually used on the right-hand side
of the rule.
Sometimes it is inconvenient to set the breakpoint based on a line number.
It is also possible to set a breakpoint based on the rule label of a particular
rule. Consider the following definition (lesson-19-c.k
):
module LESSON-19-C imports INT imports BOOL syntax Bool ::= isEven(Int) [function] rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0 rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0 endmodule
We will run the program isEven(4)
. We can set a breakpoint for when a rule
applies by means of the MODULE-NAME.label.rhs
syntax:
(gdb) break LESSON-19-C.isEven.rhs
Breakpoint 1 at 0x2afda0: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-40-29-LNNT8YEZ61/tmp.in.ZG93vWCGGC -1 .krun-2021-08-13-14-40-29-LNNT8YEZ61/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, LESSON-19-C.isEven.rhs () at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6 rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb)
(lldb) breakpoint set --name LESSON-19-C.isEven.rhs
Breakpoint 1: where = interpreter`LESSON-19-C.isEven.rhs at lesson-19-c.k:6:18, address = 0x00000001000038e0
(lldb) run
Process 51205 launched: '/Users/brucecollie/code/scratch/lesson-19-c-kompiled/interpreter' (x86_64)
Process 51205 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x00000001000038e0 interpreter`LESSON-19-C.isEven.rhs at lesson-19-c.k:6:18
3 imports BOOL
4
5 syntax Bool ::= isEven(Int) [function]
-> 6 rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
7 rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
8
9 endmodule
(lldb)
We can also set a breakpoint for when a rule's side condition is evaluated
by means of the MODULE-NAME.label.sc
syntax:
(gdb) break LESSON-19-C.isEven.sc
Breakpoint 1 at 0x2afd70: file /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k, line 6.
(gdb) run
Starting program: /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c-kompiled/interpreter .krun-2021-08-13-14-41-48-1BoGfJRbYc/tmp.in.kg4F8cwfCe -1 .krun-2021-08-13-14-41-48-1BoGfJRbYc/result.kore
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
6 rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
(gdb) finish
Run till exit from #0 LESSON-19-C.isEven.sc (VarI=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:6
0x00000000002b2662 in LblisEven'LParUndsRParUnds'LESSON-19-C'Unds'Bool'Unds'Int (_1=4) at /home/dwightguth/kframework-5.0.0/k-distribution/k-tutorial/1_basic/19_debugging/lesson-19-c.k:5
5 syntax Bool ::= isEven(Int) [function]
Value returned is $1 = true
(gdb)
(lldb) breakpoint set --name LESSON-19-C.isEven.sc
Breakpoint 1: where = interpreter`LESSON-19-C.isEven.sc + 1 at lesson-19-c.k:6:18, address = 0x00000001000038c1
(lldb) run
Process 52530 launched: '/Users/brucecollie/code/scratch/lesson-19-c-kompiled/interpreter' (x86_64)
Process 52530 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x00000001000038c1 interpreter`LESSON-19-C.isEven.sc(VarI=0x0000000101800088) at lesson-19-c.k:6:18
3 imports BOOL
4
5 syntax Bool ::= isEven(Int) [function]
-> 6 rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
7 rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
8
9 endmodule
(lldb) finish
Process 52649 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step out
Return value: (bool) $0 = true
frame #0: 0x00000001000069e5 interpreter`LblisEven'LParUndsRParUnds'LESSON-19-C'Unds'Bool'Unds'Int(_1=0x0000000101800088) at lesson-19-c.k:5:19
2 imports INT
3 imports BOOL
4
-> 5 syntax Bool ::= isEven(Int) [function]
6 rule [isEven]: isEven(I) => true requires I %Int 2 ==Int 0
7 rule [isOdd]: isEven(I) => false requires I %Int 2 =/=Int 0
8
(lldb)
Here we have used the built-in command finish
to tell us whether the side
condition returned true or not. Note that once again, we see the substitution
that was matched from the left-hand side. Like before, a variable will only
appear here if it is used in the side condition.
Sometimes it is useful to try to determine why a particular rule did or did
not apply. K provides some basic debugging commands which make it easier
to determine this.
Consider the following K definition (lesson-19-d.k
):
module LESSON-19-D syntax Foo ::= foo(Bar) syntax Bar ::= bar(Baz) | bar2(Baz) syntax Baz ::= baz() | baz2() rule [baz]: foo(bar(baz())) => .K endmodule
Suppose we try to run the program foo(bar(baz2()))
. It is obvious from this
example why the rule in this definition will not apply. However, in practice,
such cases are not always obvious. You might look at a rule and not immediately
spot why it didn't apply on a particular term. For this reason, it can be
useful to get the debugger to provide a log about how it tried to match that
term. You can do this with the k match
command. If you are stopped after
having run k start
or k step
, you can obtain this log for any rule after
any step by running the command k match MODULE.label subject
for a particular
top-level rule label.
For example, with the baz
rule above, we get the following output:
(gdb) k match LESSON-19-D.baz subject
Subject:
baz2 ( )
does not match pattern:
baz ( )
(lldb) k match LESSON-19-D.baz subject
Subject:
baz2 ( )
does not match pattern:
baz ( )
As we can see, it provided the exact subterm which did not match against the
rule, as well as the particular subpattern it ought to have matched against.
This command does not actually take any rewrite steps. In the event that
matching actually succeeds, you will still need to run the k step
command
to advance to the next step.
In addition to the functionality provided above, you have the full power of
GDB or LLDB at your disposal when debugging. Some features are not particularly
well-adapted to K code and may require more advanced knowledge of the
term representation or implementation to use effectively, but anything that
can be done in GDB or LLDB can in theory be done using this debugging functionality.
We suggest you refer to the
GDB Documentation or
LLDB Tutorial if you
want to try to do something and are unsure as to how.
isKResult
function and observe the state of thek match
command to report the reason whyOnce you have completed the above exercises, you can continue to
Lesson 1.20: K Backends and the Haskell Backend.
The purpose of this lesson is to teach about the multiple backends of K,
in particular the Haskell Backend which is the complement of the backend we
have been using so far.
Thus far, we have not discussed the distinction between the K frontend and
the K backends at all. We have simply assumed that if you run kompile
on a
K definition, there will be a compiler backend that will allow you to execute
the K definition you have compiled.
K actually has multiple different backends. The one we have been using so far
implicitly, the default backend, is called the LLVM Backend. It is
designed to support efficient, optimized concrete execution and search. It
does this by compiling your K definition to LLVM bitcode and then using LLVM
to generate machine code for it that is compiled and linked and executed.
However, K is a formal methods toolkit at the end of the day, and the primary
goal many people have when defining a programming language in K is to
ultimately be able to perform more advanced verification on programs in their
programming language.
It is for this purpose that K also provides the Haskell Backend, so called
because it is implemented in Haskell. While we will cover the features of the
Haskell Backend in more detail in the next two lessons, the important thing to
understand is that it is a separate backend which is optimized for more formal
reasoning about programming languages. While it is capable of performing
concrete execution, it does not do so as efficiently as the LLVM Backend.
In exchange, it provides more advanced features.
You can choose which backend to use to compile a K definition by means of the
--backend
flag to kompile
. By default, if you do not specify this flag, it
is equivalent to if you had specified --backend llvm
. However, to use the
Haskell Backend instead, you can simply say kompile --backend haskell
on a
particular K definition.
As an example, here is a simple K definition that we have seen before in the
previous lesson (lesson-20.k
):
module LESSON-20 imports INT rule I => I +Int 1 requires I <Int 100 endmodule
Previously we compiled this definition using the LLVM Backend, but if we
instead execute the command kompile lesson-20.k --backend haskell
, we
will get an interpreter for this K definition that is implemented in Haskell
instead. Unlike the default LLVM Backend, the Haskell Backend is not a
compiler per se. It does not generate new Haskell code corresponding to your
programming language and then compile and execute it. Instead, it is an
interpreter which reads the generated IR from kompile
and implements in
Haskell an interpreter that is capable of interpreting any K definition.
Note that on arm64 macOS (Apple Silicon), there is a known issue with the Compact
library that causes crashes in the Haskell backend. Pass the additional flag
--no-haskell-binary
to kompile
to resolve this.
This flag is also needed when using krun
.
Try running the program 0
in this K definition on the Haskell Backend and
compare the final configuration to what you would get compiling the same
definition with the LLVM Backend.
As a quick note, K does provide one other backend, which exists primarily as
legacy code which should be considered deprecated. This is the
Java Backend. The Java Backend is essentially a precursor to the Haskell
Backend. We will not cover this backend in any detail since it is deprecated,
but we still mention it here for the purposes of understanding.
--search
to krun
when using the LLVM backend.Once you have completed the above exercises, you can continue to
Lesson 1.21: Unification and Symbolic Execution.
The purpose of this lesson is to teach the basic concepts of symbolic execution
in order to introduce the unique capabilities of the Haskell Backend at a
conceptual level.
Thus far, all of the programs we have run using K have been concrete
configurations. What this means is that the configuration we use to initialize
the K rewrite engine is concrete; in other words, contains no logical
variables. The LLVM Backend is a concrete execution engine, meaning that
it is only capable of rewriting concrete configurations.
By contrast, the Haskell Backend performs symbolic execution, which is
capable of rewriting any configuration, including those where parts of the
configuration are symbolic, ie, contain variables or uninterpreted
functions.
Previously, we have introduced the concept that K rewrite rules operate by
means of pattern matching: the current configuration being rewritten is pattern
matched against the left-hand side of the rewrite rule, and the substitution
is used in order to construct a new term from the right-hand side. In symbolic
execution, we use
unification
instead of pattern matching. To summarize, unification behaves akin to a
two-way pattern matching where both the configuration and the left-hand side
of the rule can contain variables, and the algorithm generates a
most general unifier containing substitutions for the variables in both
which will make both terms equal.
Unification by itself cannot completely solve the problem of symbolic
execution. One task symbolic execution must perform is to identify whether
a particular symbolic term is feasible, that is to say, that there actually
exists a concrete instantiation of that term such that all the logical
constraints on that term can actually be satisfied. The Haskell Backend
delegates this task to Z3, an
SMT solver.
This solver is used to periodically trim configurations that are determined
to be mathematically infeasible.
The final component of symbolic execution consists of the task of introducing
symbolic terms into the configuration. This can be done one of two different
ways. First, the term being passed to krun
can actually be symbolic. This
is less frequently used because it requires the user to construct an AST
that contains variables, something which our current parsing capabilities are
not well-equipped to do. The second, more common, way of introducing symbolic
terms into a configuration consists of writing rules where there exists an
existentially qualified variable on the right-hand side of the rule that does
not exist on the left-hand side of the rule.
In order to prevent users from writing such rules by accident, K requires
that such variables begin with the ?
prefix. For example, here is a rule
that rewrites a constructor foo
to a symbolic integer:
rule <k> foo => ?X:Int ...</k>
When this rule applies, a fresh variable is introduced to the configuration, which
then is unified against the rules that might apply in order to symbolically
execute that configuration.
ensures
clausesWe also introduce here a new feature of K rules that applies when a rule
has this type of variable on the right-hand side: the ensures
clause.
An ensures
clause is similar to a requires
clause and can appear after
a rule body, or after a requires
clause. The ensures
clause is used to
introduce constraints that might apply to the variable that was introduced by
that rule. For example, we could write the rule above with the additional
constraint that the symbolic integer that was introduced must be less than
five, by means of the following rule:
rule <k> foo => ?X:Int ...</k> ensures ?X <Int 5
Putting all these pieces together, it is possible to use the Haskell Backend
to perform symbolic reasoning about a particular K module, determining all the
possible states that can be reached by a symbolic configuration.
For example, consider the following K definition (lesson-21.k
):
module LESSON-21 imports INT rule <k> 0 => ?X:Int ... </k> ensures ?X =/=Int 0 rule <k> X:Int => 5 ... </k> requires X >=Int 10 endmodule
When we symbolically execute the program 0
, we get the following output
from the Haskell Backend:
<k>
5 ~> .
</k>
#And
{
true
#Equals
?X:Int >=Int 10
}
#And
#Not ( {
?X:Int
#Equals
0
} )
#Or
<k>
?X:Int ~> .
</k>
#And
#Not ( {
true
#Equals
?X:Int >=Int 10
} )
#And
#Not ( {
?X:Int
#Equals
0
} )
Note some new symbols introduced by this configuration: #And
, #Or
, and
#Equals
. While andBool
, orBool
, and ==K
represent functions of sort
Bool
, #And
, #Or
, and #Equals
are matching logic connectives. We
will discuss matching logic in more detail later in the tutorial, but the basic
idea is that these symbols represent Boolean operators over the domain of
configurations and constraints, as opposed to over the Bool
sort.
Notice that the configuration listed above is a disjunction of conjunctions.
This is the most common form of output that can be produced by the Haskell
Backend. In this case, each conjunction consists of a configuration and a set
of constraints. What this conjunction describes, essentially, is a
configuration and a set of information that was derived to be true while
rewriting that configuration.
Similar to how we saw --search
in a previous lesson, the reason we have
multiple disjuncts is because there are multiple possible output states
for this program, depending on whether or not the second rule applied. In the
first case, we see that ?X
is greater than or equal to 10, so the second rule
applied, rewriting the symbolic integer to the concrete integer 5. In the
second case, we see that the second rule did not apply because ?X
is less
than 10. Moreover, because of the ensures
clause on the first rule, we know
that ?X
is not zero, therefore the first rule will not apply a second time.
If we had omitted this constraint, we would have ended up infinitely applying
the first rule, leading to krun
not terminating.
In the next lesson, we will cover how symbolic execution forms the backbone
of deductive program verification in K and how we can use K to prove programs
correct against a specification.
LESSON-21
that rewrites odd integers greater than0
after adding thisOnce you have completed the above exercises, you can continue to
Lesson 1.22: Basics of Deductive Program Verification using K.
In this lesson, you will familiarize yourself with the basics of using K for
deductive program verification.
We base this lesson on a simple programming language with functions,
assignment, if conditionals, and while loops. Take your time to study its
formalization below (lesson-22.k
):
module LESSON-22-SYNTAX
imports INT-SYNTAX
imports BOOL-SYNTAX
imports ID-SYNTAX
syntax Exp ::= IExp | BExp
syntax IExp ::= Id | Int
syntax KResult ::= Int | Bool | Ints
// Take this sort structure:
//
// IExp
// / \
// Int Id
//
// Through the List{_, ","} functor.
// Must add a `Bot`, for a common subsort for the empty list.
syntax Bot
syntax Bots ::= List{Bot, ","} [klabel(exps)]
syntax Ints ::= List{Int, ","} [klabel(exps)]
| Bots
syntax Ids ::= List{Id, ","} [klabel(exps)]
| Bots
syntax Exps ::= List{Exp, ","} [klabel(exps), seqstrict]
| Ids | Ints
syntax IExp ::= "(" IExp ")" [bracket]
| IExp "+" IExp [seqstrict]
| IExp "-" IExp [seqstrict]
> IExp "*" IExp [seqstrict]
| IExp "/" IExp [seqstrict]
> IExp "^" IExp [seqstrict]
| Id "(" Exps ")" [strict(2)]
syntax BExp ::= Bool
syntax BExp ::= "(" BExp ")" [bracket]
| IExp "<=" IExp [seqstrict]
| IExp "<" IExp [seqstrict]
| IExp ">=" IExp [seqstrict]
| IExp ">" IExp [seqstrict]
| IExp "==" IExp [seqstrict]
| IExp "!=" IExp [seqstrict]
syntax BExp ::= BExp "&&" BExp
| BExp "||" BExp
syntax Stmt ::=
Id "=" IExp ";" [strict(2)] // Assignment
| Stmt Stmt [left] // Sequence
| Block // Block
| "if" "(" BExp ")" Block "else" Block [strict(1)] // If conditional
| "while" "(" BExp ")" Block // While loop
| "return" IExp ";" [seqstrict] // Return statement
| "def" Id "(" Ids ")" Block // Function definition
syntax Block ::=
"{" Stmt "}" // Block with statement
| "{" "}" // Empty block
endmodule
module LESSON-22
imports INT
imports BOOL
imports LIST
imports MAP
imports LESSON-22-SYNTAX
configuration
<k> $PGM:Stmt </k>
<store> .Map </store>
<funcs> .Map </funcs>
<stack> .List </stack>
// -----------------------------------------------
rule <k> I1 + I2 => I1 +Int I2 ... </k>
rule <k> I1 - I2 => I1 -Int I2 ... </k>
rule <k> I1 * I2 => I1 *Int I2 ... </k>
rule <k> I1 / I2 => I1 /Int I2 ... </k>
rule <k> I1 ^ I2 => I1 ^Int I2 ... </k>
rule <k> I:Id => STORE[I] ... </k>
<store> STORE </store>
// ------------------------------------------------
rule <k> I1 <= I2 => I1 <=Int I2 ... </k>
rule <k> I1 < I2 => I1 <Int I2 ... </k>
rule <k> I1 >= I2 => I1 >=Int I2 ... </k>
rule <k> I1 > I2 => I1 >Int I2 ... </k>
rule <k> I1 == I2 => I1 ==Int I2 ... </k>
rule <k> I1 != I2 => I1 =/=Int I2 ... </k>
rule <k> B1 && B2 => B1 andBool B2 ... </k>
rule <k> B1 || B2 => B1 orBool B2 ... </k>
rule <k> S1:Stmt S2:Stmt => S1 ~> S2 ... </k>
rule <k> ID = I:Int ; => . ... </k>
<store> STORE => STORE [ ID <- I ] </store>
rule <k> { S } => S ... </k>
rule <k> { } => . ... </k>
rule <k> if (true) THEN else _ELSE => THEN ... </k>
rule <k> if (false) _THEN else ELSE => ELSE ... </k>
rule <k> while ( BE ) BODY => if ( BE ) { BODY while ( BE ) BODY } else { } ... </k>
rule <k> def FNAME ( ARGS ) BODY => . ... </k>
<funcs> FS => FS [ FNAME <- def FNAME ( ARGS ) BODY ] </funcs>
rule <k> FNAME ( IS:Ints ) ~> CONT => #makeBindings(ARGS, IS) ~> BODY </k>
<funcs> ... FNAME |-> def FNAME ( ARGS ) BODY ... </funcs>
<store> STORE => .Map </store>
<stack> .List => ListItem(state(CONT, STORE)) ... </stack>
rule <k> return I:Int ; ~> _ => I ~> CONT </k>
<stack> ListItem(state(CONT, STORE)) => .List ... </stack>
<store> _ => STORE </store>
rule <k> return I:Int ; ~> . => I </k>
<stack> .List </stack>
syntax KItem ::= #makeBindings(Ids, Ints)
| state(continuation: K, store: Map)
// ----------------------------------------------------
rule <k> #makeBindings(.Ids, .Ints) => . ... </k>
rule <k> #makeBindings((I:Id, IDS => IDS), (IN:Int, INTS => INTS)) ... </k>
<store> STORE => STORE [ I <- IN ] </store>
endmodule
Next, compile this example using kompile lesson-22.k --backend haskell
. If
your processor is an Apple Silicon processor, add the --no-haskell-binary
flag if the compilation fails.
Next, take the following snippet of K code and save it in lesson-22-spec.k
.
This is a skeleton of the proof environment, and we will complete it as the
lesson progresses.
requires "lesson-22.k"
requires "domains.md"
module LESSON-22-SPEC-SYNTAX
imports LESSON-22-SYNTAX
endmodule
module VERIFICATION
imports K-EQUAL
imports LESSON-22-SPEC-SYNTAX
imports LESSON-22
imports MAP-SYMBOLIC
endmodule
module LESSON-22-SPEC
imports VERIFICATION
endmodule
claim
keyword, followed by the claimclaim <k> 3 + 4 => 7 ... </k>
Add this claim to the LESSON-22-SPEC
module and run the K prover using the
command kprove lesson-22-spec.k
. You should get back the output #Top
,
which denotes the Matching Logic equivalent of true
and means, in this
context, that all claims have been proven correctly.
if
statement that has a concrete condition:claim <k> if ( 3 + 4 == 7 ) {
$a = 1 ;
} else {
$a = 2 ;
}
=> . ... </k>
<store> STORE => STORE [ $a <- 1 ] </store>
stating that the given program terminates (=> .
), and when it does, the value
of the variable $a
is set to 1
, meaning that the execution will have taken
the then
branch. Add this claim to the LESSON-22-SPEC
module, but also add
syntax Id ::= "$a" [token]
to the LESSON-22-SPEC-SYNTAX
module in order to declare $a
as a token so
that it can be used as a program variable. Re-run the K prover, which should
again return #Top
.
if
claim <k> $a = A:Int ; $b = B:Int ;
if ($a < $b) {
$c = $b ;
} else {
$c = $a ;
}
=> . ... </k>
<store> STORE => STORE [ $a <- A ] [ $b <- B ] [ $c <- ?C:Int ] </store>
ensures (?C ==Int A) orBool (?C ==Int B)
The program in question first assigns symbolic integers A
and B
to program
variables $a
and $b
, respectively, and then executes the given if
statement, which has a symbolic condition (A < B
), updating the value of the
program variable $c
in both branches. The specification we give states that
the if
statement terminates, with $a
and $b
updated, respectively, to A
and B
, and $c
updated to some symbolic integer value ?C
. Via the
ensures
clause, which is used to specify additional constraints that hold
after execution, we also state that this existentially quantified ?C
equals
either A
or B
.
Add the productions declaring $b
and $c
as tokens to the
LESSON-22-SPEC-SYNTAX
module, the claim to the LESSON-22-SPEC
module, run
the K prover again, and observe the output, which should not be #Top
this
time. This means that K was not able to prove the claim, and we now need to
understand why. We do so by examining the output, which should look as follows:
(InfoReachability) while checking the implication:
The configuration's term unifies with the destination's term,
but the implication check between the conditions has failed.
#Not (
#Exists ?C . {
STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- ?C:Int ]
#Equals
STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
}
#And
{
true
#Equals
?C ==Int A orBool ?C ==Int B
}
)
#And
<generatedTop>
<k>
_DotVar1
</k>
<store>
STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
</store>
<funcs>
_Gen3
</funcs>
<stack>
_Gen5
</stack>
</generatedTop>
#And
{
true
#Equals
A <Int B
}
This output starts with a message telling us at which point the proof failed,
followed by the final state, which consists of three parts: some negative
Matching Logic (ML) constraints, the final configuration (<generatedTop> ... </generatedTop>
), and some positive ML constraints. Generally speaking,
these positive and the negative constraints could arise from various sources,
such as (but not limited to) branches taken by the execution
(e.g. { true #Equals A <Int B }
or #Not ( { true #Equals A <Int B } )
),
or ensures
constraints.
First, we examine the message:
(InfoReachability) while checking the implication:
The configuration's term unifies with the destination's term,
but the implication check between the conditions has failed.
which tells us that the structure of the final configuration is as expected,
but that some of the associated constraints cannot be proven. We next look at
the final configuration, in which the relevant item is the <store> ... </store>
cell, because it is the only one that we are reasoning about. By
inspecting its contents:
STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
we see that we should be within the constraints of the ensures
, since the
value of $c
in the store equals B
in this branch. We next examine the
negative and positive constraints of the output and, more often than not, the
goal is to instruct K how to use the information from the final configuration
and the positive constraints to falsify one of the negative constraints. This
is done through simplifications.
So, the positive constraint that we have is
{ true #Equals A <Int B }
meaning that A <Int B
holds. Given the analysed program, this tells us that
we are in the then
branch of the if
. The negative constraint is
#Not (
#Exists ?C . {
STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- ?C:Int ]
#Equals
STORE [ $a <- A:Int ] [ $b <- B:Int ] [ $c <- B:Int ]
}
#And
{ true #Equals ?C ==Int A orBool ?C ==Int B }
)
and we observe, from the first equality, that the existential ?C
should be
instantiated with B
. This would make both branches of the #And
true,
falsifying the outside #Not
. We just need to show K
how to conclude that
?C ==Int B
. We do so by introducing the following simplification into the
VERIFICATION
module:
rule { M:Map [ K <- V ] #Equals M [ K <- V' ] } => { V #Equals V' } [simplification]
which formalizes our internal understanding of ?C ==Int B
. The rule states
that when we update the same key in the same map with two values, and the
resulting maps are equal, then the two values must be equal as well. The
[simplification]
attribute indicates to K to use this rule to simplify the
state when trying to prove claims. Like function rules, simplification rules
do not complete to the top of the configuration, but instead apply anywhere
their left-hand-side matches. Re-run the K prover, which should now return
#Top
, indicating that K was able to use the simplification and prove the
required claims.
while
loops. Inclaim
<k>
while ( 0 < $n ) {
$s = $s + $n;
$n = $n - 1;
} => . ...
</k>
<store>
$s |-> (S:Int => S +Int ((N +Int 1) *Int N /Int 2))
$n |-> (N:Int => 0)
</store>
requires N >=Int 0
which adds the sum of the first $n
integers to $s
, assuming the value of $n
is non-negative to begin with. This is reflected in the store by stating that,
after the execution of the loop, the original value of $s
(which is set to
equal some symbolic integer S
) is incremented by ((N +Int 1) *Int N /Int 2)
, and the value of $n
always equals 0
. Add $n
and $s
as tokens in
the LESSON-22-SPEC-SYNTAX
module, the above claim to the LESSON-22-SPEC
module, and run the K prover, which should return #Top
.
claim
<k>
def $sum($n, .Ids) {
$s = 0 ;
while (0 < $n) {
$s = $s + $n;
$n = $n - 1;
}
return $s;
}
$s = $sum(N:Int, .Ints);
=> . ... </k>
<funcs> .Map => ?_ </funcs>
<store> $s |-> (_ => ((N +Int 1) *Int N /Int 2)) </store>
<stack> .List </stack>
requires N >=Int 0
Essentially, we have wrapped the while
loop from claim 3.4 into a function
$sum
, and then called that function with a symbolic integer N
, storing the
return value in the variable $s
. The specification states that this program
ends up storing the sum of the first N
integers in the variable $n
. Add $sum
to the LESSON-22-SPEC-SYNTAX
module, the above claim to the
LESSON-22-SPEC
module, and run the K prover, which should again return
#Top
.
Change the condition of the if statement in part 3.2 to take the else
branch and adjust the claim so that the proof passes.
The post-condition of the specification in part 3.3 loses some information.
In particular, the value of ?C
is in fact the maximum of A
and B
.
Prove the same claim as in 3.2, but with the post-condition ensures (?C ==Int maxInt(A, B))
. For this, you will need to extend the VERIFICATION
module with two simplifications that capture the meaning of maxInt(A:Int, B:Int)
. Keep in mind that any rewriting rule can be used as a
simplification; in particular, that simplifications can have requires
clauses.
Following the pattern shown in part 3.4, assuming a non-negative initial
value of $b
, specify and verify the following while
loop:
while ( 0 < $b ) {
$a = $a + $c;
$b = $b - 1;
$c = $c - 1;
}
Hint: You will not need additional simplifications---once you've got the
specification right, the proof will go through.
The goal of this second section is to supplement a beginning developer's
knowledge of K after they have gained a basic understanding of K. Each lesson
in this section can be completed independently in order to learn about a
particular facet of the K language. The lessons are written to provide basic
understanding of less commonly-used features of K to someone who is still
learning K. For more complete references of these features, the reader ought to
consult the User Manual.
The reader ought to be able to complete lessons in this section as needed in
order to learn about specific features of interest, but if desired, can also
complete the entire section in one go. Someone who has completed this entire
section ought to be able to read and understand most K specifications, as well
as write their own specifications of some complexity, and use them to perform
most common K-related tasks. They can then read about specific lessons in
Section 3: Advanced K Concepts if they want to
learn more.
#Or
Patterns#fun
and #let
#as
patterns:=K
and :/=K
The purpose of this lesson is to explain the behavior of the macro
,
macro-rec
, alias
, and alias-rec
production attributes, as well as the
anywhere
rule attribute. These attributes control the meaning of how rules
associated with them are applied.
Thus far in the K tutorial, we have described three different types of rules:
This lesson introduces three more types of rules, the first of which are
macros. A production is a macro if it has the macro
attribute, and all
rules whose top symbol on the left hand side is a macro are macro rules
which define the behavior of the macro. Like function rules and simplification
rules, macro rules do not participate in cell completion. However, unlike
function rules and simplification rules, macro rules are applied statically
before rewriting begins, and the macro symbol is expected to no longer appear
in the initial configuration for rewriting once all macros in that
configuration are rewritten.
The rationale behind macros is they allow you to define one piece of syntax
in terms of another piece of syntax without any runtime overhead associated
with the cost of rewriting one to the other. This process is a common one in
programming language design and specification and is referred to as
desugaring; The syntax that is transformed is typically also referred to as
syntactic sugar for another type of syntax. For example, in a language with
if
statements and curly braces, you could write the following fragment
(lesson-01.k
):
module LESSON-01 imports BOOL syntax Stmt ::= "if" "(" Exp ")" Stmt [macro] | "if" "(" Exp ")" Stmt "else" Stmt | "{" Stmts "}" syntax Stmts ::= List{Stmt,""} syntax Exp ::= Bool rule if ( E ) S => if ( E ) S else { .Stmts } endmodule
In this example, we see that an if statement without an else
clause is
defined in terms of one with an else
clause. As a result, we would only
need to give a single rule for how to rewrite if
statements, rather than
two separate rules for two types of if
statements. This is a common pattern
for dealing with program syntax that contains an optional component to it.
It is worth noting that by default, macros are not applied recursively. To be
more precise, by default a macro that arises as a result of the expansion of
the same macro is not rewritten further. This is primarily to simplify the
macro expansion process and reduce the risk that improperly defined macros will
lead to non-terminating behavior.
It is possible, however, to tell K to expand a macro recursively. To do this,
simply replace the macro
attribute with the macro-rec
attribute. Note that
K does not do any kind of checking to ensure termination here, so it is
important that rules be defined correctly to always terminate, otherwise the
macro expansion phase will run forever. Fortunately, in practice it is very
simple to ensure this property for most of the types of macros that are
typically used in real-world semantics.
Using a Nat
sort containing the constructors 0
and S
(i.e., a
Peano-style axiomatization of the
natural numbers where S(N) = N + 1
, S(S(N)) = N + 2
, etc), write a macro
that will compute the sum of two numbers.
NOTE: This lesson introduces the concept of "aliases", which are a variant
of macros. While similar, this is different from the concept of "aliases" in
matching logic, which is introduced in Lesson 2.16.
Macros can be very useful in helping you define a programming language.
However, they can be disruptive while pretty printing a configuration. For
example, you might write a set of macros that transforms the code the user
wrote into equivalent code that is slightly harder to read. This can make it
more difficult to understand the code when it is pretty printed as part of the
output of rewriting.
K defines a relatively straightforward but novel solution to this problem,
which is known as a K alias. An alias in K is very similar to a macro,
with the exception that the rewrite rule will also be applied backwards
during the pretty-printing process.
It is very simple to make a production be an alias instead of a macro: simply
use the alias
or alias-rec
attributes instead of the macro
or macro-rec
attributes. For example, if the example involving if
statements above was
declared using an alias instead of a macro, the Stmt
term if (E) {} else {}
would be pretty-printed as if (E) {}
. This is because during pretty-printing,
the term participates in another macro-expansion pass. However, this macro
expansion step will only apply rules with the alias
or alias-rec
attribute,
and, critically, it will reverse the rule by treating the left-hand side as if
it were the right-hand side, and vice versa.
This can be very useful to allow you to define one construct in terms of
another while still being able to pretty-print the result as if it were
the original term in question. This can be especially useful for applications
of K where we are taking the output of rewriting and attempting to use it as
a code fragment that we then execute, such as with test generation.
Modify LESSON-01
above to use an alias instead of a macro and experiment
with how various terms are pretty-printed by invoking krun
on them.
anywhere
rulesThe last type of rule introduced in this lesson is the anywhere rule. An
anywhere rule is specified by adding the anywhere
attribute to a rule. Such a
rule is similar to a function rule in that it does not participate in cell
completion, and will apply anywhere that the left-hand-side matches in the
configuration, but distinct in that the symbol in question can still be matched
against in the left-hand side of other rules, even during concrete rewriting.
The reasoning behind this is that instead of the symbol in question being a
constructor, it is a constructor modulo the axioms defined with the
anywhere
attribute. Essentially, the rules with the anywhere
attribute will
apply as soon as they appear in the right-hand side of a rule being applied,
but the symbol in question will still be treated as a symbol that can be
matched on if it is not completely removed by those rules.
This can be useful in certain cases to allow you to define transformations over
particular pieces of syntax while still generally giving those pieces of syntax
another meaning when the anywhere rule does not apply. For example, the ISO C
standard defines the semantics of *&x
as exactly equal to x
, with no
reading or writing of memory taking place, and the K semantics of C implements
this functionality using an anywhere rule that is applied at compilation time.
NOTE: the anywhere
attribute is only implemented on the LLVM backend
currently. Attempting to use it in a semantics that is compiled with the
Haskell backend will result in an error being reported by the compiler. This
should be remembered when using this attribute, as it may not be suitable for
a segment of a semantics which is intended to be symbolically executed.
anywhere
rules rather than top-level rewrite rules.Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
#Or
PatternsClick here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
#fun
and #let
Click here to return to the Table of Contents for Section 2.
:=K
and :/=K
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
Click here to return to the Table of Contents for Section 2.
NOTE: The K User Manual is still under construction; some features of K
may have partial or missing documentation.
The K Framework is a programming language and system design toolkit made for
practioners and researchers alike.
K For Practioners:
K is a framework for deriving programming languages tools from their semantic
specifications.
Typically, programming language tool development follows a similar pattern.
After a new programming language is designed, separate teams will develop
separate language tools (e.g. a compiler, interpreter, parser, symbolic
execution engine, etc). Code reuse is uncommon. The end result is that for each
new language, the same basic tools and patterns are re-implemented again and
again.
K approaches the problem differently -- it generates each of these tools from a single language specification.
The work of programming language design and tool implementation are made separate concerns.
The end result is that the exercise of
designing new languages and their associated tooling is now reduced to
developing a single language specification from which we derive our tooling for
free.
K For Researchers:
K is a configuration- and rewrite-based executable semantic framework.
In more detail, K specifications are:
K specifications are compiled into particular matching logic theories, giving
them a simple and expressive semantics. K semantic rules are implicitly defined
over the entire configuration structure, but omit unused cells, enabling a
highly modular definitional style. Furthermore, K has been used to develop
programming languages, type systems, and formal analysis tools.
As mentioned in the Why K? section above, the K Framework is designed as a
collection of language-generic command-line interface (CLI) tools which revolve
around K specifications. These tools cover a broad range of uses, but they
typically fall into one of the following categories:
The main user-facing K tools include:
kompile
- the K compiler driverkparse
- the stanadlone K parser and abstract syntax tree (AST)krun
- the K interpreter and symbolic execution engine driverkprove
- the K theorem proverThis user manual is designed to be a tool reference.
In particular, it is not desgined to be a tutorial on how to write K
specifications or to teach the logical foundations of K. New K users should
consult our dedicated
K tutorial,
or the more language-design oriented
PL tutorial.
Researchers seeking to learn more about the logic underlying K are encouraged
to peruse the
growing literature on K and matching logic.
We will consider the manual complete when it provides a complete description of
all user-facing K tools and features.
Since K specifications are the primary input into the entire system, let us
take a moment to describe them. At the highest level, K specifications describe
a programming language or system using three different pieces:
K specifications are then defined by a collection of sentences which
correspond to the three concepts above:
syntax
declarations encode the system primitives;configuration
declarations encode the system state;context
and rule
declarations encode the system behavior.K sentences are then organized into one or modules which are stored in one or
more files. In this scheme, files may require other files and modules may
import other modules, giving rise to a hierarchy of files and modules. We
give an intuitive sketch of the two levels of grouping in the diagram below:
example.k file
+=======================+
| requires ".." --------|--> File_1
| ... |
| requires ".." --------|--> File_N
| |
| +-----------------+ |
| | module .. | |
| | imports .. ---|--|--> Module_1
| | ... | |
| | imports .. ---|--|--> Module_M
| | | |
| | sentence_1 | |
| | ... | |
| | sentence_K | |
| | endmodule | |
| +-----------------+ |
| |
+=======================+
where:
..
);...
).In the end, we require that the file and module hierarchies both form a
directed acyclic graph (DAG). This is, no file may recursively require itself,
and likewise, no module may recursively import itself.
We now zoom in further to discuss the various kinds of sentences contained in K
specifications:
sentences that define our system's primitives, including:
sentences that define our system's state, including:
sentences that define our system's behavior, including:
We now examine how the K tools are generally used. The main input to all of the
K tools is a K specification. For effieciency reasons, this specification is
first compiled into an intermediate representation called Kore. Once we have
obtained this intermediate representation, we can use it to do:
We represent the overall process using the graphic below:
K Compilation Process
+============================================================+
| +---------+ |
| K Specification ---| kompile |--> Kore Specification --+ |
| +---------+ | |
+=========================================================|==+
|
K Execution Process |
+=========================================================|==+
| | |
| +-------------------------------------------+ |
| | |
| | +---------+ |
| K Term ----+-------| kparse |--> K Term |
| | +---------+ |
| | |
| | +---------+ |
| K Term ----+-------| krun |--> K Term |
| | +---------+ |
| | |
| | +---------+ |
| K Claims --+-------| kprove |--> K Claims |
| +---------+ |
| |
+============================================================+
where:
kparse
or executed usingkrun
)kprove
)K Compilation Process:
Let us start with a description of the compilation process. According to the
above diagram, the compiler driver is called kompile
. For our purposes, it is
enough to view the K compilation process as a black box that transforms a K
specification into a lower-level Kore specification that encodes the same
information, but that is easier to work with programmatically.
K Execution Process:
We now turn our attention to the K execution process. Abstractly, we can divide
the K execution process into the following stages:
kparse
, krun
, or kprove
)Note that all of the above steps performed in K execution process are fully
prescribed by the input K specification. Of course, there are entire languages
devoted to encoding these various stages proces individually, e.g., flex
for
lexers, bison
for parsers, etc. What K offers is a consistent language to
package the above concepts in a way that we believe is convenient and practical
for a wide range of uses.
K modules are declared at the top level of a K file. They begin with the
module
keyword and are followed by a module ID and an optional set of
attributes. They continue with zero or more imports and zero or more sentences
until the endmodule
keyword is reached.
A module ID consists of an optional #
at the beginning, followed by one or
more components separated by hyphens. Each component can contain letters,
numbers, or underscores.
After the module ID, attributes can be specified in square brackets. See below
for an (incomplete) list of allowed module attributes.
Following the attributes, a module can contain zero or more imports. An
import consists of the import
or imports
keywords followed by a module ID.
An import tells the compiler that this module should contain all the sentences
(recursively) contained by the module being imported.
Imports can be public or private. By default, they are public, which
means that all the imported syntax can be used by any module that imports the
module doing the import. However, you can explicitly override the visibility
of the import with the public
or private
keyword immediately prior to the
module name. A module imported privately does not export its syntax to modules
that import the module doing the import.
Following imports, a module can contain zero or more sentences. A sentence can
be a syntax declaration, a rule, a configuration declaration, a context, a
claim, or a context alias. Details on each of these can be found in subsequent
sections.
private
attributeIf the module is given the private
attribute, all of its imports and syntax
are private by default. Individual pieces of syntax can be made public with
the public
attribute, and individual imports can be made public with the
public
keyword. See relevant sections on syntax and modules for more details
on what it means for syntax and imports to be public or private.
symbolic
and concrete
attributeThese attributes may be placed on modules to indicate that they should only
be used by the Haskell and LLVM backends respectively. If the definition is
compiled on the opposite backend, they are implicitly removed from the
definition prior to parsing anywhere they are imported. This can be useful when
used in limited capacity in order to provide alternate semantics for certain
features on different backends. It should be used sparingly as it makes it more
difficult to trust the correctness of your semantics, even in the presence of
testing.
We have added a syntax to Productions which allows non-terminals to be given a
name in productions. This significantly improves the ability to document K, by
providing a way to explicitly explain what a field in a production corresponds
to instead of having to infer it from a comment or from the rule body.
The syntax is:
name: Sort
This syntax can be used anywhere in a K definition that expects a non-terminal.
symbol(_)
attributeBy default, when compiling a definition, K generates a unique "mangled" label
identifier for each syntactic production. These identifiers can be used to
reference productions externally, for example when constructing terms by hand
or programmatically via Pyk.
The symbol(_)
attribute can be applied to a production to control the precise
identifier for a production that appears in a compiled definition. For example:
module SYMBOLS syntax Foo ::= foo() [symbol(foo)] | bar() endmodule
Here, the compiled definition will contain the following symbol declarations:
symbol Lblfoo{}() ...
symbol Lblbar'LParRParUnds'SYMBOLS'Unds'Foo{}() ...
The compiler enforces uniqueness[1] of symbol names specified in
this way; it would be an error to apply symbol(foo)
to another production in
the module above. Additionally, symbol(_)
with an argument may not co-occur
with the klabel(_)
attribute (see below).
overload
attributeK supports subsort overloading[2] on symbols, whereby a
constructor can have a more specific sort for certain arguments. For example,
consider the following productions derived from a C-like language semantics:
syntax Exp ::= LVal | Exp "." Id syntax LVal ::= LVal "." Id
Here, it is useful for the result of the dot operator to be an LVal
if the
left-hand side is itself an LVal
. However, there is an issue with the code
as written: if L()
is a term of sort LVal
, then the program L() . x
has a
parsing ambiguity between the two productions for the dot operator. To resolve
this, we can mark the productions as overloads:
syntax Exp ::= LVal | Exp "." Id [overload(_._)] syntax LVal ::= LVal "." Id [overload(_._)]
Now, the parser will select the most specific overloaded production when it
resolves ambiguities in L() . x
(that is, L() . x
parses to a term of sort
LVal
.
Formally, the compiler organises productions into a partial order that defines
the overload relation as follows. We say that P
is a more specific overload
of Q
if:
P
and Q
have the same overload(_)
attribute. Note that the argumentS_P
be the sort of P
, and S_p1
etc. be the sorts of its argumentsQ
). The tuple (S_P, S_p1, ..., S_pN)
must be elementwise(S_Q, S_q1, ..., S_qN)
according to the definition'sP
is a restrictionQ
; when its arguments are more precise, we can giveklabel(_)
and symbol
attributesNote: the klabel(_), symbol
approach described in this section is a legacy
feature that will be removed in the future. New code should use the symbol(_)
and overload(_)
attributes to opt into explicit naming and overloading
respectively.
References here to "overloading" are explained in the section above; the use
of the klabel(_)
attribute without symbol
is equivalent to the new
overload(_)
syntax.
By default K generates for each syntax definition a long and obfuscated klabel
string, which serves as a unique internal identifier and also is used in kast
format of that syntax. If we need to reference a certain syntax production
externally, we have to manually define the klabels using the klabel
attribute.
One example of where you would want to do this is to be able to refer to a given
symbol via the syntax priority
attribute, or to enable overloading of a
given symbol.
If you only provide the klabel
attribute, you can use the provided klabel
to
refer to that symbol anywhere in the frontend K code. However, the internal
identifier seen by the backend for that symbol will still be the long obfuscated
generated string. Sometimes you want control over the internal identifier used as
well, in which case you use the symbol
attribute. This tells the frontend to
use whatever the declared klabel
is directly as the internal identifier.
For example:
module MYMODULE syntax FooBarBaz ::= #Foo( Int, Int ) [klabel(#Foo), symbol] // symbol1 | #Bar( Int, Int ) [klabel(#Bar)] // symbol2 | #Baz( Int, Int ) // symbol3 endmodule
Here, we have that:
#Foo
(from klabel(#Foo)
),'Hash'Foo
as the symbol name.#Bar
(from klabel(#Bar)
),'Hash'Bar'LParUndsCommUndsRParUnds'MYMODULE'Unds'FooBarBaz'Unds'Int'Unds'Int
#Baz(_,_)_MYMODULE_FooBarBaz_Int_Int
(from auto-generated klabel), and'Hash'Baz'LParUndsCommUndsRParUnds'MYMODULE'Unds'FooBarBaz'Unds'Int'Unds'Int
The symbol
provided must be unique to this definition. This is enforced by
K. In general, it's recommended to use the symbol
attribute whenever you use
klabel
unless you explicitly have a reason not to (e.g. you want to overload
symbols, or you're using a deprecated backend). It can be very helpful use the
symbol
attribute for debugging, as many debugging messages are printed in
Kast format which will be more readable with the symbol
names you explicitly
declare. In addition, if you are programatically manipulating definitions via
the JSON Kast format, building terms using the user-provided pretty
symbol, klabel(...)
is easier and less error-prone if the auto-generation
process for klabels changes.
When using K's support for syntactic lists, a production like:
syntax Ints ::= List{Int, ","} [symbol(ints)]
will desugar into two productions:
syntax Ints ::= Int "," Ints [symbol(ints)] syntax Ints ::= ".Ints" [symbol(List{"ints"})]
Note that the symbol for the terminator of the list has been generated
automatically from the label on the original production. It is possible to
control what the terminator's label is using the terminator-symbol(_)
attribute. For example:
syntax Ints ::= List{Int, ","} [symbol(ints), terminator-symbol(.ints)]
will desugar into two productions:
syntax Ints ::= Int "," Ints [symbol(ints)] syntax Ints ::= ".Ints" [symbol(.ints)]
It is an error to apply terminator-symbol(_)
to a non-production sentence, or
to a production that does not declare a syntactic list.
bracket
attributesSome syntax productions, like the rewrite operator, the bracket operator, and
the #if #then #else #fi operator, cannot have their precise type system
expressed using only concrete sorts.
Prior versions of K solved this issue by using the K sort in this case, but
this introduces inexactness in which poorly typed terms can be created even
without having a cast operator present in the syntax, which is a design
consideration we would prefer to avoid.
It also introduces cases where terms cannot be placed in positions where they
ought to be well sorted unless their return sort is made to be KBott, which in
turn vastly complicates the grammar and makes parsing much slower.
In order to introduce this, we provide a new syntax for parametric productions
in K. This allows you to express syntax that has a sort signature based on
parametric polymorphism. We do this by means of an optional curly-brace-
enclosed list of parameters prior to the return sort of a production.
Some examples:
syntax {Sort} Sort ::= "(" Sort ")" [bracket] syntax {Sort} KItem ::= Sort syntax {Sort} Sort ::= KBott syntax {Sort} Sort ::= Sort "=>" Sort syntax {Sort} Sort ::= "#if" Bool "#then" Sort "#else" Sort "#fi" syntax {Sort1, Sort2} Sort1 ::= "#fun" "(" Sort2 "=>" Sort1 ")" "(" Sort2 ")"
Here we have:
Note the last case, in which two different parameters are specified separated
by a comma. This indicates that we have multiple independent parameters which
must be the same each place they occur, but not the same as the other
parameters.
In practice, because every sort is a subsort of K, the Sort2
parameter in #6 above does nothing during parsing. It cannot
actually reject any parse, because it can always infer that the sort of the
argument and parameter are K, and it has no effect on the resulting sort of
the term. However, it will nevertheless affect the kore generated from the term
by introducing an additional parameter to the symbol generated for the term.
function
and total
attributesMany times it becomes easier to write a semantics if you have "helper"
functions written which can be used in the RHS of rules. The function
attribute tells K that a given symbol should be simplified immediately when it
appears anywhere in the configuration. Semantically, it means that evaluation
of that symbol will result in at most one return value (that is, the symbol is
a partial function).
The total
attribute indicates that a symbol cannot be equal to matching logic
bottom; in other words, it has at least one value for every possible set of
arguments. It can be added to a production with the function
attribute to
indicate to the symbolic reasoning engine that a given symbol is a
total function, that is it has exactly one return value for every possible
input. Other uses of the total
attribute (i.e., on multi-valued symbols to
indicate they always have at least one value) are not yet implemented.
For example, here we define the _+Word_
total function and the _/Word_
partial function, which can be used to do addition/division modulo
2 ^Int 256
. These functions can be used anywhere in the semantics where
integers should not grow larger than 2 ^Int 256
. Notice how _/Word_
is
not defined when the denominator is 0
.
syntax Int ::= Int "+Word" Int [function, total] | Int "/Word" Int [function] rule I1 +Word I2 => (I1 +Int I2) modInt (2 ^Int 256) rule I1 /Word I2 => (I1 /Int I2) modInt (2 ^Int 256) requires I2 =/=Int 0
freshGenerator
attributeIn K, you can access "fresh" values in a given domain using the syntax
!VARNAME:VarSort
(with the !
-prefixed variable name). This is supported for
builtin sorts Int
and Id
already. For example, you can generate fresh
memory locations for declared identifiers as such:
rule <k> new var x ; => . ... </k> <env> ENV => ENV [ x <- !I:Int ] </env> <mem> MEM => MEM [ !I <- 0 ] </mem>
Each time a !
-prefixed variable is encountered, a new integer will be used,
so each variable declared with new var _ ;
will get a unique position in the
<mem>
.
Sometimes you want to have generation of fresh constants in a user-defined
sort. For this, K will still generate a fresh Int
, but can use a converter
function you supply to turn it into the correct sort. For example, here we can
generate fresh Foo
s using the freshFoo(_)
function annotated with
freshGenerator
.
syntax Foo ::= "a" | "b" | "c" | d ( Int ) syntax Foo ::= freshFoo ( Int ) [freshGenerator, function, total] rule freshFoo(0) => a rule freshFoo(1) => b rule freshFoo(2) => c rule freshFoo(I) => d(I) [owise] rule <k> new var x ; => . ... </k> <env> ENV => ENV [ x <- !I:Int ] </env> <mem> MEM => MEM [ !I <- !F:Foo ] </mem>
Now each newly allocated memory slot will have a fresh Foo
placed in it.
token
attributeThe token
attribute signals to the Kore generator that the associated sort
will be inhabited by domain values. Sorts inhabited by domain values must not
have any constructors declared.
syntax Bytes [hook(BYTES.Bytes), token]
[token]
sortsYou can convert between tokens of one sort via String
s by defining functions
implemented by builtin hooks.
The hook STRING.token2string
allows conversion of any token to a string:
syntax String ::= FooToString(Foo) [function, total, hook(STRING.token2string)]
Similarly, the hook STRING.string2Token
allows the inverse:
syntax Bar ::= StringToBar(String) [function, total, hook(STRING.string2token)]
WARNING: This sort of conversion does NOT do any sort of parsing or validation.
Thus, we can create arbitary tokens of any sort:
StringToBar("The sun rises in the west.")
Composing these two functions lets us convert from Foo
to Bar
syntax Bar ::= FooToBar(Foo) [function] rule FooToBar(F) => StringToBar(FooToString(F))
#Layout
sortProductions for the #Layout
sort are used to describe tokens that are
considered "whitespace". The scanner removes tokens matching these productions
so they are not even seen by the parser. Below, we use it to define
lines begining with ;
(semicolon) as comments.
syntax #Layout ::= r"(;[^\\n\\r]*)" // Semi-colon comments | r"([\\ \\n\\r\\t])" // Whitespace
prec
attributeConsider the following naive attempt at creating a language what syntax that
allows two types of variables: names that contain underbars, and names that
contain sharps/hashes/pound-signs:
syntax NameWithUnderbar ::= r"[a-zA-Z][A-Za-z0-9_]*" [token] syntax NameWithSharp ::= r"[a-zA-Z][A-Za-z0-9_#]*" [token] syntax Pgm ::= underbar(NameWithUnderbar) | sharp(NameWithSharp)
Although, it seems that K has enough information to parse the programs
underbar(foo)
and sharp(foo)
with, the lexer does not take into account
whether a token is being parsed for the sharp
or for the underbar
production. It chooses an arbitary sort for the token foo
(perhaps
NameWithUnderbar
). Thus, during paring it is unable to construct a valid term
for one of those programs (sharp(foo)
) and produces the error message:
Inner Parser: Parse error: unexpected token 'foo'.
Since calculating inclusions and intersections between regular expressions is
tricky, we must provide this information to K. We do this via the prec(N)
attribute. The lexer will always prefer longer tokens to shorter tokens.
However, when it has to choose between two different tokens of equal length,
token productions with higher precedence are tried first. Note that the default
precedence value is zero when the prec
attribute is not specified.
For example, the BUILTIN-ID-TOKENS
module defines #UpperId
and #LowerId
with
the prec(2)
attribute.
syntax #LowerId ::= r"[a-z][a-zA-Z0-9]*" [prec(2), token] syntax #UpperId ::= r"[A-Z][a-zA-Z0-9]*" [prec(2), token]
Furthermore, we also need to make sorts with more specific tokens subsorts of ones with more
general tokens. We add the token attribute to this production so that all
tokens of a particular sort are marked with the sort they are parsed as and not a
subsort thereof. e.g. we get underbar(#token("foo", "NameWithUnderbar"))
instead of underbar(#token("foo", "#LowerId"))
imports BUILTIN-ID-TOKENS syntax NameWithUnderbar ::= r"[a-zA-Z][A-Za-z0-9_]*" [prec(1), token] | #UpperId [token] | #LowerId [token] syntax NameWithSharp ::= r"[a-zA-Z][A-Za-z0-9_#]*" [prec(1), token] | #UpperId [token] | #LowerId [token] syntax Pgm ::= underbar(NameWithUnderbar) | sharp(NameWithSharp)
unused
attributeK will warn you if you declare a symbol that is not used in any of the rules of
your definition. Sometimes this is intentional, however; in this case, you can
suppress the warning by adding the unused
attribute to the production or
cell.
syntax Foo ::= foo() [unused] configuration <foo unused=""> .K </foo>
deprecated
attributeSymbols can be marked as deprecated by adding the deprecated
attribute to
their declaration. If that symbol subsequently appears in the definition (in a
rule, context, context alias or configuration), the compiler will issue a
warning.
syntax Foo ::= foo() [deprecated] rule foo() => . // warning on this line
Unlike most other parser generators, K combines the task of parsing with AST
generation. A production declared with the syntax
keyword in K is both a
piece of syntax used when parsing, and a symbol that is used when rewriting.
As a result, it is generally convenient to describe expression grammars using
priority and associativity declarations rather than explicitly transforming
your grammar into a series of nonterminals, one for each level of operator
precedence. Thus, for example, a simple grammar for addition and multiplication
will look like this:
syntax Exp ::= Exp "*" Exp | Exp "+" Exp
However, this grammar is ambiguous. The term x+y*z
might refer to x+(y*z)
or to (x+y)*z
. In order to differentiate this, we introduce a partial
ordering between productions known as priority. A symbol "has tighter priority"
than another symbol if the first symbol can appear under the second, but the
second cannot appear under the first without a bracket. For example, in
traditional arithmetic, multiplication has tighter priority than addition,
which means that x+y*z
cannot parse as (x+y)*z
because the addition
operator would appear directly beneath the multiplication, which is forbidden
by the priority filter.
Priority is applied individually to each possible ambiguous parse of a term. It
then either accepts or rejects that parse. If there is only a single remaining
parse (after all the other disambiguation steps have happened), this is the
parse that is chosen. If all the parses were rejected, it is a parse error. If
multiple parses remain, they might be resolved by further disambiguation such
as via the prefer
and avoid
attributes, but if multiple parses remain after
disambiguation finishes, this is an ambiguous parse error, indicating there is
not a unique parse for that term. In the vast majority of cases, this is
an error and indicates that you ought to either change your grammar or add
brackets to the term in question.
Priority is specified in K grammars by means of one of two different
mechanisms. The first, and simplest, simply replaces the |
operator in a
sequence of K productions with the >
operator. This operator indicates that
everything prior to the >
operator (including transitively) binds tighter
than what comes after. For example, a more complete grammar for simple
arithmetic might be:
syntax Exp ::= Exp "*" Exp | Exp "/" Exp > Exp "+" Exp | Exp "-" Exp
This indicates that multiplication and division bind tigher than addition
and subtraction, but that there is no relationship in priority between
multiplication and division.
As you may have noticed, this grammar is also ambiguous. x*y/z
might refer to
x*(y/z)
or to (x*y)/z
. Indeed, if we removed division and subtraction
entirely, the grammar would still be ambiguous: x*y*z
might parse as
x*(y*z)
, or as (x*y)*z
. To resolve this, we introduce another feature:
associativity. Roughly, asssociativity tells us how symbols are allowed to nest
within other symbols with the same priority. If a set of symbols is left
associative, then symbols in that set cannot appear as the rightmost child
of other symbols in that set. If a set of symbols is right associative, then
symbols in that set cannot appear as the leftmost child of other symbols in
that set. Finally, if a set of symbols is non-associative, then symbols
in that set cannot appear as the rightmost or leftmost child of other symbols
in that set. For example, in the above example, if addition and subtraction
are left associative, then x+y+z
will parse as (x+y)+
z and x+y-z
will
parse as (x+y)-z
(because the other parse will have been rejected).
You might notice that this seems to apply only to binary infix operators. In
fact, the real behavior is slightly more complicated. Priority and
associativity (for technical reasons that go beyond the scope of this document)
really only apply when the rightmost or leftmost item in a production is a
nonterminal. If the rightmost nonterminal is followed by a terminal (or
respectively the leftmost preceded), priority and associativity do not apply.
Thus we can generalize these concepts to arbitrary context-free grammars.
Note that in some cases, this is not the behavior you want. You may actually
want to reject parses even though the leftmost and rightmost item in a
production are terminals. You can accomplish this by means of the
applyPriority
attribute. When placed on a production, it tells the parser
which nonterminals of a production the priority filter ought to reject children
under, overriding the default behavior. For example, I might have a production
like syntax Exp ::= foo(Exp, Exp) [applyPriority(1)]
. This tells the parser
to reject terms with looser priority binding under the first Exp
, but not
the second. By default, with this production, neither position would apply
to the priority filter, because the first and last items of the production
are both terminals.
Associativity is specified in K grammars by means of one of two different
mechanisms. The first, and simplest, adds the associativity of a priority block
of symbols prior to that block. For example, we can remove the remaining
ambiguities in the above grammar like so:
syntax Exp ::= left: Exp "*" Exp | Exp "/" Exp > right: Exp "+" Exp | Exp "-" Exp
This indicates that multiplication and division are left-associative, ie, after
symbols with higher priority are parsed as innermost, symbols are nested with
the rightmost on top. Addition and subtraction are right associative, which
is the opposite and indicates that symbols are nested with the leftmost on top.
Note that this is similar but different from evaluation order, which also
concerns itself with the ordering of symbols, which is described in the next
section.
You may note we have not yet introduced the second syntax for priority
and associativity. In some cases, syntax for a grammar might be spread across
multiple modules, sometimes for very good reasons with respect to code
modularity. As a result, it becomes infeasible to declare priority and
associativity inline within a set of productions, because the productions
are not contiguous within a single file.
For this purpose, we introduce the equivalent syntax priority
,
syntax left
, syntax right
, and syntax non-assoc
declarations. For
example, the above grammar can be written equivalently as:
syntax Exp ::= Exp "*" Exp [group(mult)] | Exp "/" Exp [group(div)] | Exp "+" Exp [group(add)] | Exp "-" Exp [group(sub)] syntax priority mult div > add sub syntax left mult div syntax right add sub
Here, the group(_)
attribute is used to create user-defined groups of
sentences. A particular group name collectively refers to the whole set of
sentences within that group. The sets are flattened together, so we could
equivalently have written:
syntax Exp ::= Exp "*" Exp [group(mult)] | Exp "/" Exp [group(mult)] | Exp "+" Exp [group(add)] | Exp "-" Exp [group(add)] syntax priority mult > add syntax left mult syntax right add
Note that syntax [left|right|non-assoc]
should not be used to group together
productions with different priorities. For example, this code would be invalid:
syntax priority mult > add syntax left mult add
Note that there is one other way to describe associativity, but it is
prone to a very common mistake. You can apply the attribute left
, right
,
or non-assoc
directly to a production to indicate that it is, by itself,
left-, right-, or non-associative.
However, this often does not mean what users think it means. In particular:
syntax Exp ::= Exp "+" Exp [left] | Exp "-" Exp [left]
is not equivalent to:
syntax Exp ::= left: Exp "+" Exp | Exp "-" Exp
Under the first, each production is associative with itself, but not each
other. Thus, x+y+z
will parse unambiguously as (x+y)+z
, but x+y-z
will
be ambiguous. However, in the second, x+y-z
will parse unambiguously as
(x+y)-z
.
Think carefully about how you want your grammar to parse. In general, if you're
not sure, it's probably best to group associativity together into the same
blocks you use for priority, rather than using left
, right
, or non-assoc
attributes on the productions.
Sometimes it is convenient to be able to give a certain regular expression a
name and then refer to it in one or more regular expression terminals. This
can be done with a syntax lexical
sentence in K:
syntax lexical Alphanum = r"[0-9a-zA-Z]"
This defines a lexical identifier Alphanum
which can be expanded in any
regular expression terminal to the above regular expression. For example, I
might choose to then implement the syntax of identifiers as follows:
syntax Id ::= r"[a-zA-Z]{Alphanum}*" [token]
Here {Alphanum}
expands to the above regular expression, making the sentence
equivalent to the following:
syntax Id ::= r"[a-zA-Z]([0-9a-zA-Z])*" [token]
This feature can be used to more modularly construct the lexical syntax of your
language. Note that K does not currently check that lexical identifiers used
in regular expressions have been defined; this will generate an error when
creating the scanner, however, and the user ought to be able to debug what
happened.
assoc
, comm
, idem
, and unit
attributesThese attributes are used to indicate whether a collection or a production
is associative, commutative, idempotent, and/or has a unit.
In general, you should not need to apply these attributes to productions
yourself, however, they do have certain special meaning to K. K will generate
axioms related to each of these concepts into your definition for you
automatically. It will also automatically sort associative-commutative
collections, and flatten the indentation of associative collections, when
unparsing.
public
and private
attributeK allows users to declare certain pieces of syntax as either public or private.
All syntax is public by default. Public syntax can be used from any module that
imports that piece of syntax. A piece of syntax can be declared private with
the private
attribute. This means that that syntax can only be used in the
module in which it is declared; it is not visible from modules that import
that module.
You can also change the default visibility of a module with the private
attribute, when it is placed directly on a module. A module with the private
attribute has all syntax private
by default; this can be overridden on
specific sentences with the public
attribute.
Note that the private
module attribute also changes the default visiblity
of imports; please refer to the appropriate section elsewhere in the manual
for more details.
Here is an example usage:
module WIDGET-SYNTAX syntax Widget ::= foo() syntax WidgetHelper ::= bar() [private] // this production is not visible // outside this module endmodule module WIDGET [private] imports WIDGET-SYNTAX syntax Widget ::= fooImpl() // this production is not visible outside this // module // this production is visible outside this module syntax KItem ::= adjustWidget(Widget) [function, public] endmodule
exit
attributeA single configuration cell containing an integer may have the "exit"
attribute. This integer will then be used as the return value on the console
when executing the program.
For example:
configuration <k> $PGM:Pgm </k> <status-code exit=""> 1 </status-code>
declares that the cell status-code
should be used as the exit-code for
invocations of krun
. Additionally, we state that the default exit-code is 1
(an error state). One use of this is for writing testing harnesses which assume
that the test fails until proven otherwise and only set the <status-code>
cell
to 0
if the test succeeds.
multiplicity
and type
attributesSometimes a semantics needs to allow multiple copies of the same cell, for
example if you are making a concurrent multi-threading programming language.
For this purpose, K supports the multiplicity
and type
attributes on cells
declared in the configuration.
multiplicity
can take on values *
and ?
. Declaring multiplicity="*"
indicates that the cell may appear any number of times in a runtime
configuration. Setting multiplicity="?"
indicates that the cell may only
appear exactly 0 or 1 times in a runtime configuration. If there are no
configuration variables present in the cell collection, the initial
configuration will start with exactly 0 instances of the cell collection. If
there are configuration variables present in the cell collection, the initial
configuration will start with exactly 1 instance of the cell collection.
type
can take on values Set
, List
, and Map
. For example, here we declare
several collecion cells:
configuration <k> $PGM:Pgm </k> <sets> <set multiplicity="?" type="Set"> 0:Int </set> </sets> <lists> <list multiplicity="*" type="List"> 0:Int </list> </lists> <maps> <map multiplicity="*" type="Map"> <map-key> 0:Int </map-key> <map-value-1> "":String </map-value-1> <map-value-2> 0:Int </map-value-2> </map> </maps>
Declaring type="Set"
indicates that duplicate occurrences of the cell should
be de-duplicated, and accesses to instances of the cell will be nondeterministic
choices (constrained by any other parts of the match and side-conditions).
Similarly, declaring type="List"
means that new instances of the cell can be
added at the front or back, and elements can be accessed from the front or back,
and the order of the cells will be maintained. The following are examples of
introduction and elimination rules for these collections:
rule <k> introduce-set(I:Int) => . ... </k> <sets> .Bag => <set> I </set> </sets> rule <k> eliminate-set => I ... </k> <sets> <set> I </set> => .Bag </sets> rule <k> introduce-list-start(I:Int) => . ... </k> <lists> (.Bag => <list> I </list>) ... </lists> rule <k> introduce-list-end(I:Int) => . ... </k> <lists> ... (.Bag => <list> I </list>) </lists> rule <k> eliminate-list-start => I ... </k> <lists> (<list> I </list> => .Bag) ... </lists> rule <k> eliminate-list-end => I ... </k> <lists> ... (<list> I </list> => .Bag) </lists>
Notice that for multiplicity="?"
, we only admit a single <set>
instance at
a time. For the type=List
cell, we can add/eliminate cells from the from or
back of the <lists>
cell. Also note that we use .Bag
to indicate the empty
cell collection in all cases.
Declaring type="Map"
indicates that the first sub-cell will be used as a
cell-key. This means that matching on those cells will be done as a map-lookup
operation if the cell-key is mentioned in the rule (for performance). If the
cell-key is not mentioned, it will fallback to normal nondeterministic
constrained by other parts of the match and any side-conditions. Note that there
is no special meaning to the name of the cells (in this case <map>
,
<map-key>
, <map-value-1>
, and <map-value-2>
). Additionally, any number of
sub-cells are allowed, and the entire instance of the cell collection is
considered part of the cell-value, including the cell-key (<map-key>
in this
case) and the surrounding collection cell (<map>
in this case).
For example, the following rules introduce, set, retrieve from, and eliminate
type="Map"
cells:
rule <k> introduce-map(I:Int) => . ... </k> <maps> ... (.Bag => <map> <map-key> I </map-key> ... </map>) ... </maps> rule <k> set-map-value-1(I:Int, S:String) => . ... </k> <map> <map-key> I </map-key> <map-value-1> _ => S </map-value-1> ... </map> rule <k> set-map-value-2(I:Int, V:Int) => . ... </k> <map> <map-key> I </map-key> <map-value-2> _ => V </map-value-2> ... </map> rule <k> retrieve-map-value-1(I:Int) => S ... </k> <map> <map-key> I </map-key> <map-value-1> S </map-value-1> ... </map> rule <k> retrieve-map-value-2(I:Int) => V ... </k> <map> <map-key> I </map-key> <map-value-2> V </map-value-2> ... </map> rule <k> eliminate-map(I:Int) => . ... </k> <maps> ... (<map> <map-key> I </map-key> ... </map> => .Bag) ... </maps>
Note how each rule makes sure that <map-key>
cell is mentioned, and we
continue to use .Bag
to indicate the empty collection. Also note that
when introducing new map elements, you may omit any of the sub-cells which are
not the cell-key. In case you do omit sub-cells, you must use structural
framing ...
to indicate the missing cells, they will receive the default
value given in the configuration ...
declaration.
Each K rule follows the same basic structure (given as an example here):
rule LHS => RHS requires REQ ensures ENS [ATTRS]
The portion between rule
and requires
is referred to as the rule body,
and may contain one or more rewrites (though not nested). Here, the rule body is
LHS => RHS
, where LHS
and RHS
are used as placeholders for the pre- and
post- states. Note that we lose no generality referring to the LHS
or the
RHS
, even in the presence of multiple rewrites, as the rewrites are pulled to
the top-level anyway.
Next is the requires clause, represented here as REQ
. The requires clause is
an additional predicate (function-like term of sort Bool
), which is to be
evaluated before applying the rule. If the requires clause does not evaluate to
true
, then the rule does not apply.
Finally is the ensures clause, represented here as ENS
. The ensures clause
is to be interpreted as a post-condition, and will be automatically added to the
path condition if the rule applies. It may cause the entire term to become
undefined, but the backend will not stop itself from applying the rule in this
case. Note that concrete backends (eg. the LLVM backend) are free to ignore the
ensures clause.
Overall, the transition represented by such a rule is from a state
LHS #And REQ
ending in a state RHS #And ENS
. When backends apply this rule
as a transition/rewrite, they should:
LHS
matches (or unifies) with the current term, givingalpha
.alpha(REQ)
is valid (or satisfiable).alpha(RHS #And ENS)
, and check if it's satisfiable.Sometimes when you want to express a side condition, you want to say that a
rule matches if a particular term matches a particular pattern, or if it
instead does /not/ match a particular pattern.
The syntax in K for this is :=K and :/=K. It has similar meaning to ==K and
=/=K, except that where ==K and =/=K express equality, :=K and =/=K express
model membership. That is to say, whether or not the rhs is a member of the set
of terms expressed by the lhs pattern. Because the lhs of these operators is a
pattern, the user can use variables in the lhs of the operator. However, due to
current limitations, these variables are NOT bound in the rest of the term.
The user is thus encouraged to use anonymous variables only, although this is
not required.
This is compiled by the K frontend down to an efficient pattern matching on a
fresh function symbol.
There are a number of cases in K where you would prefer to be able to take some
term on the RHS, bind it to a variable, and refer to it in multiple different
places in a rule.
You might also prefer to take a variable for which you know some of its
structure, and modify some of its internal structure without requiring you to
match on every single field contained inside that structure.
In order to do this, we introduce syntax to K that allows you to construct
anonymous functions in the RHS of a rule and apply them to a term.
The syntax for this is:
#fun(RuleBody)(Argument)
Note the limitations currently imposed by the implementation. These functions
are not first-order: you cannot bind them to a variable and inject them like
you can with a regular klabel for a function. You also cannot express multiple
rules or multiple parameters, or side conditions. All of these are extensions
we would like to support in the future, however.
In the following, we use three examples to illustrate the behavior of #fun
.
We point out that the support for #fun
is provided by the frontend, not the
backends.
The three examples are real examples borrowed or modified from existing language
semantics.
Example 1 (A Simple Self-Explained Example).
#fun(V:Val => isFoo(V) andBool isBar(V))(someFunctionReturningVal())
Example 2 (Nested #fun).
#fun(C
=> #fun(R
=> #fun(E
=> foo1(E, R, C)
)(foo2(C))
)(foo3(0))
)(foo4(1))
This example is from the beacon
semantics:https://github.com/runtimeverification/beacon-chain-spec/blob/master/b
eacon-chain.k at line 302, with some modification for simplicity. Note how
variables C, R, E
are bound in the nested #fun
.
Example 3 (Matching a structure).
rule foo(K, RECORD) =>
#fun(record(... field: _ => K))(RECORD)
Unlike previous examples, the LHS of #fun
in this example is no longer a
variable, but a structure. It has the same spirit as the first two examples,
but we match the RECORD
with a structure record( DotVar, field: X)
, instead
of a standalone variable. We also use K's local rewrite syntax (i.e., the
rewriting symbol =>
does not occur at the top-level) to prevent writing
duplicate expressions on the LHS and RHS of the rewriting.
A production can be tagged with the macro
, alias
, macro-rec
, or alias-rec
attributes. In all cases, what this signifies is that this is a macro production.
Macro rules are rules where the top symbol of the left-hand-side are macro
labels. Macro rules are applied statically during compilation on all terms that
they match, and statically before program execution on the initial configuration.
Currently, macro rules are required to not have side conditions, although they
can contain sort checks.
alias
rules are also applied statically in reverse prior to unparsing on the
final configuration. Note that a macro rule can have unbound variables in the
right hand side. When such a macro exists, it should be used only on the left
hand side of rules, unless the user is performing symbolic execution and expects
to introduce symbolic terms into the subject being rewritten.
However, when used on the left hand side of a rule, it functions similarly to a
pattern alias, and allows the user to concisely express a reusable pattern that
they wish to match on in multiple places.
For example, consider the following semantics:
syntax KItem ::= "foo" [alias] | "foobar" syntax KItem ::= bar(KItem) [macro] | baz(Int, KItem) rule foo => foobar rule bar(I) => baz(?_, I) rule bar(I) => I
This will rewrite baz(0, foo)
to foo
. First baz(0, foo)
will be rewritten
statically to baz(0, foobar)
. Then the non-macro
rule will apply (because
the rule will have been rewritten to rule baz(_, I) => I
). Then foobar
will
be rewritten statically after rewriting finishes to foo
via the reverse form
of the alias.
Note that macros do not apply recursively within their own expansion. This is
done so as to ensure that macro expansion will always terminate. If the user
genuinely desires a recursive macro, the macro-rec
and alias-rec
attributes
can be used to provide this behavior.
For example, consider the following semantics:
syntax Exp ::= "int" Exp ";" | "int" Exps ";" [macro] | Exp Exp | Id syntax Exps ::= List{Exp,","} rule int X:Id, X':Id, Xs:Exps ; => int X ; int X', Xs ;
This will expand int x, y, z;
to int x; int y, z;
because the macro does
not apply the second time after applying the substitution of the first
application. However, if the macro
attribute were changed to the macro-rec
attribute, it would instead expand (as the user likely intended) to
int x; int y; int z;
.
The alias-rec
attribute behaves with respect to the alias
attribute the
same way the macro-rec
attribute behaves with respect to macro
.
anywhere
rulesSome rules are not functional, but you want them to apply anywhere in the
configuration (similar to functional rules). You can use the anywhere
attribute on a rule to instruct the backends to make sure they apply anywhere
they match in the entire configuration.
For example, if you want to make sure that some associative operator is always
right-associated anywhere in the configuration, you can do:
syntax Stmt ::= Stmt ";" Stmt rule (S1 ; S2) ; S3 => S1 ; (S2 ; S3) [anywhere]
Then after every step, all occurrences of _;_
will be re-associated. Note that
this allows the symbol _;_
to still be a constructor, even though it is
simplified similarly to a function
.
trusted
claimsYou may add the trusted
attribute to a given claim for the K prover to
automatically add it to the list of proven circularities, instead of trying to
discharge it separately.
K automatically generates certain predicate and projection functions from the
syntax you declare. For example, if you write:
syntax Foo ::= foo(bar: Bar)
It will automatically generate the following K code:
syntax Bool ::= isFoo(K) [function] syntax Foo ::= "{" K "}" ":>Foo" [function] syntax Bar ::= bar(Foo) [function] rule isFoo(F:Foo) => true rule isFoo(_) => false [owise] rule { F:Foo }:>Foo => F rule bar(foo(B:Bar)) => B
The first two types of functions are generated automatically for every sort in
your K definition, and the third type of function is generated automatically
for each named nonterminal in your definition. Essentially, isFoo
for some
sort Foo
will tell you whether a particular term of sort K
is a Foo
,
{F}:>Foo
will cast F
to sort Foo
if F
is of sort Foo
and will be
undefined (i.e., theoretically defined as #Bottom
, the bottom symbol in
matching logic) otherwise. Finally, bar
will project out the child of a foo
named bar
in its production declaration.
Note that if another term of equal or smaller sort to Foo
exists and has a
child named bar
of equal or smaller sort to Bar
, this will generate an
ambiguity during parsing, so care should be taken to ensure that named
nonterminals are sufficiently unique from one another to prevent such
ambiguities. Of course, the compiler will generate a warning in this case.
simplification
attributeThe simplification attribute identifies rules outside the main semantics that
are used to simplify function patterns.
Conditions: A simplification rule is applied by matching the function
arguments, instead of unification as when applying function definition
rules. This allows function symbols to appear nested as arguments to other
functions on the left-hand side of a simplification rule, which is forbidden in
function definition rules. For example, this rule would not be accepted as a
function definition rule:
rule (X +Int Y) +Int Z => X +Int (Y +Int Z) [simplification]
A simplification rule is only applied when the current side condition implies
the requires
clause of the rule, like function definition rules.
Order: The simplification
attribute accepts an optional integer argument
which is the rule's simplification priority; if the optional argument is not
specified, it is equivalent to a simplification priority of 50. Backends
should attempt simplification rules in order of their simplification
priority, but are not required to do so; in fact, the backend is free to apply
simplification
rules at any time. Because of this, users must ensure that
simplification rules are sound regardless of their order of application. This
differs from the priority
attribute in that rules with the priority
attribute must be applied in their priority order by the backend. It is an
error to have the priority
attribute on a simplification
rule.
For example, for the following definition:
syntax WordStack ::= Int ":" WordStack | ".WordStack" syntax Int ::= sizeWordStack ( WordStack ) [function] | sizeWordStackAux ( WordStack , Int ) [function] // -------------------------------------------------------------- rule sizeWordStack(WS) => sizeWordStackAux(WS, 0) rule sizeWordStackAux(.WordStack, N) => N rule sizeWordStackAux(W : WS , N) => sizeWordStackAux(WS, N +Int 1)
We might add the following simplification lemma:
rule sizeWordStackAux(WS, N) => N +Int sizeWordStackAux(WS, 0) requires N =/=Int 0 [simplification]
Then this simplification rule will only apply if the Haskell backend can prove
that notBool N =/=Int 0
is unsatisfiable. This avoids an infinite cycle of
applying this simplification lemma.
NOTE: The frontend and Haskell backend do not check that supplied
simplification rules are sound, this is the developer's responsibility. In
particular, rules with the simplification attribute must preserve definedness;
that is, if the left-hand side refers to any partial function then:
#Bottom
when the left-hand side is #Bottom
, orensures
clause that is false
when the left-hand#Bottom
, orrequires
clause that is false
when the left-hand#Bottom
.These conditions are in order of decreasing preference: the best option is to
preserve #Bottom
on the right-hand side, the next best option is to have an
ensures
clause, and the least-preferred option is to have a requires
clause.
The most preferred option is to write total functions and avoid the entire issue.
NOTE: The Haskell backend does not attempt to prove claims which right-hand
side is #Bottom
. The reason for this is that the general case is undecidable,
and the backend might enter an infinite loop. Therefore, the backend emits a
warning if it encounters such a claim.
concrete
and symbolic
attributes (Haskell backend)Users can control the application of simplification
rules using the concrete
and the symbolic
attributes by specifying the type of patterns the rule's
arguments are to match.
A concrete pattern is a pattern which does not contain variables or unevaluated
functions, otherwise the pattern is symbolic.
The semantics of the two attributes is defined as follows:
concrete
, then all arguments must besymbolic
, then all arguments must beconcrete(<variables>)
(resp. symbolic(<variables>)
),<variables>
is a list of variable names separated by commas, can be usedFor example, the following will only match when all arguments
are concrete:
rule X +Int (Y +Int Z) => (X +Int Y) +Int Z [simplification, concrete]
Conversely, the following will only match when all arguments
are symbolic:
rule X +Int (Y +Int Z) => (X +Int Y) +Int Z [simplification, symbolic]
In practice, the following rules will re-associate and commute terms to combine
concrete arguments:
rule (A +Int Y) +Int Z => A +Int (Y +Int Z) [concrete(Y, Z), symbolic(A), simplification] rule X +Int (B +Int Z) => B +Int (X +Int Z) [concrete(X, Z), symbolic(B), simplification]
unboundVariables
attributeNormally, K rules are not allowed to contain regular (i.e., not fresh, not
existential) variables in the RHS / requires
/ ensures
clauses which are not
bound in the LHS.
However, in certain cases this behavior might be desired, like, for example,
when specifying a macro rule which is to be used in the LHS of other rules.
To allow for such cases, but still be useful and perform the unboundness checks
in regular cases, the unboundVariables
attributes allows the user to specify
a comma-separated list of names of variables which can be unbound in the rule.
For example, in the macro declaration
rule cppEnumType => bar(_, scopedEnum() #Or unscopedEnum() ) [unboundVariables(_)]
the declaration unboundVariables(_)
allows the rule to pass the unbound
variable checks, and this in turn allows for cppEnumType
to be used in
the LHS of a rule to mean the pattern above:
rule inverseConvertType(cppEnumType, foo((cppEnumType #as T::CPPType => underlyingType(T))))
memo
attributeThe memo
attribute is a hint from the user to the backend to memoize a
function. Not all backends support memoization, but when the attribute is used
and the definition is compiled for a memo
-supporting backend, then calls to
the function may be cached. At the time of writing, only the Haskell
backend supports memoization.
The Haskell backend will only cache a function call if all arguments are concrete.
It is recommended not to memoize recursive functions, as each recursive call
will be stored in the cache, but only the first iteration will be retrieved from
the cache; that is, the cache will be filled with many unreachable
entries. Instead, we recommend to perform a worker-wrapper transformation on
recursive functions, and apply the memo
attribute to the wrapper.
Warning: A function declared with the memo
attribute must not use
uninterpreted functions in the side-condition of any rule. Memoizing such an
impure function is unsound. To see why, consider the following rules:
syntax Bool ::= impure( Int ) [function] syntax Int ::= unsound( Int ) [function, memo] rule unsound(X:Int) => X +Int 1 requires impure(X) rule unsound(X:Int) => X requires notBool impure(X)
Because the function impure
is not given rules to cover all inputs, unsound
can be memoized incoherently. For example,
{unsound(0) #And {impure(0) #Equals true}} #Equals 1
but
{unsound(0) #And {impure(0) #Equals false}} #Equals 0
The memoized value of unsound(0)
would be incoherently determined by which
pattern the backend encounters first.
In K, it is not required that users declare the sorts of variables in rules or
in the initial configuration. If the user does not explicitly declare the sort
of a variable somewhere via a cast (see below), the sort of the variable is
inferred from context based on the sort signature of every place the variable
appears in the rule.
As an example, consider the rule for addition in IMP:
syntax Exp ::= Exp "+" Exp | Int rule I1 + I2 => I1 +Int I2
Here +Int
is defined in the INT module with the following signature:
syntax Int ::= Int "+Int" Int [function]
In the rule above, the sort of both I1
and I2
is inferred as Int
. This is because
a variable must have the same sort every place it appears within the same rule.
While a variable appearing only on the left-hand-side of the rule could have
sort Exp
instead, the same variable appears as a child of +Int
, which
constriants the sorts of I1
and I2
more tightly. Since the sort must be a
subsort of Int
or equal to Int
, and Int
has no subsorts, we infer Int
as the sorts of I1
and I2
. This means that the above rule will not match
until I1
and I2
become integers (i.e., have already been evaluated).
More complex examples are possible, however:
syntax Exp ::= Exp "+" Int | Int rule _ + _ => 0
Here we have two anonymous variables. They do not refer to the same variable
as one another, so they can have different sorts. The right side is constrained
by +
to be of sort Int
, but the left side could be either Exp
or Int
.
When this occurs, we have multiple solutions to the sorts of the variables in
the rule. K will only choose solutions which are maximal, however. To be
precise, if two different solutions exist, but the sorts of one solution are
all greater than or equal to the sorts of the other solution, K will discard
the smaller solution. Thus, in the case above, the variable on the left side
of the +
is inferred of sort Exp
, because the solution (Exp
, Int
) is
strictly greater than the solution (Int
, Int
).
It is possible, however, for terms to have multiple maximal solutions:
syntax Exp ::= Exp "+" Int | Int "+" Exp | Int rule I1 + I2 => 0
In this example, there is an ambiguous parse. This could parse as either
the first +
or the second. In the first case, the maximal solution chosen is
(Exp
, Int
). In the second, it is (Int
, Exp
). Neither of these solutions is
greater than the other, so both are allowed by K. As a result, this program
will emit an error because the parse is ambiguous. To pick one solution over
the other, a cast or a prefer
or avoid
attribute can be used.
There are three main types of casts in K: the semantic cast, the strict cast,
and the projection cast.
For every sort S
declared in your grammar, K will define the following
production for you for use in rules:
syntax S ::= S ":S"
The meaning of this cast is that the term inside the cast must be less than
or equal to Sort
. This can be used to resolve ambiguities, but its principle
purpose is to guide execution by telling K what sort variables must match in
order for the rule to apply. When compiled, it will generate a pattern that
matches on an injection into Sort
.
K also introduces the strict cast:
syntax S ::= S "::S"
The meaning at runtime is exactly the same as the semantic cast; however, it
restricts the sort of the term inside the cast to exactly Sort
. That is
to say, if you use it on something that is a strictly smaller sort, it will
generate a type error. This is useful in certain circumstances to help
disambiguate terms, when a semantic cast would not have resolved the ambiguity.
As such, it is primarily used to solve ambiguities rather than to guide
execution.
K also introduces the projection cast:
syntax {S2} S ::= "{" S2 "}" ":>S"
The meaning of this cast at runtime is that if the term inside is of sort
Sort
, it should have it injection stripped away and the value inside is
returned as a term of static sort Sort
. However, if the term is of a
different sort, it is an error and execution will get stuck. Thus the primary
usefulness of this cast is to cast the return value of a function with a
greater sort down to a strictly smaller sort that you expect the return value
of the function to have. For example:
syntax Exp ::= foo(Exp) [function] | bar(Int) | Int rule foo(I:Int) => I rule bar(I) => bar({foo(I +Int 1)}:>Int)
Here we know that foo(I +Int 1)
will return an Int, but the return sort of
foo
is Exp
. So we project the result into the Int
sort so that it can
be placed as the child of a bar
.
owise
and priority
attributes.Sometimes, it is simply not convenient to explicitly describe every
single negative case under which a rule should not apply. Instead,
we simply wish to say that a rule should only apply after some other set of
rules have been tried. K introduces two different attributes that can be
added to rules which will automatically generate the necessary matching
conditions in a manner which is performant for concrete execution (indeed,
it generally outperforms during concrete execution code where the conditions
are written explicitly).
The first is the owise
attribute. Very roughly, rules without an attribute
indicating their priority apply first, followed by rules with the owise
attribute only if all the other rules have been tried and failed. For example,
consider the following function:
syntax Int ::= foo(Int) [function] rule foo(0) => 0 rule foo(_) => 1 [owise]
Here foo(0)
is defined explicitly as 0
. Any other integer yields the
integer 1
. In particular, the second rule above will only be tried after the
first rule has been shown not to apply.
This is because the first rule has a lower number assigned for its priority
than the second rule. In practice, each rule in your semantics is implicitly
or explicitly assigned a numerical priority. Rules are tried in increasing
order of priority, starting at zero and trying each increasing numerical value
successively.
You can specify the priority of a rule with the priority
attribute. For
example, I could equivalently write the second rule above as:
rule foo(_) => 1 [priority(200)]
The number 200
is not chosen at random. In fact, when you use the owise
attribute, what you are doing is implicitly setting the priority of the rule
to 200
. This has a couple of implications:
200
apply after all rules with theowise
attribute have been tried.There is one more rule by which priorities are assigned: a rule with no
attributes indicating its priority is assigned the priority 50. Thus,
with each priority explicitly declared, the above example looks like:
syntax Int ::= foo(Int) [function] rule foo(0) => 0 [priority(50)] rule foo(_) => 1 [owise]
One final note: the llvm backend reserves priorities between 50 and 150
inclusive for certain specific purposes. Because of this, explicit
priorities which are given within this region may not behave precisely as
described above. This is primarily in order that it be possible where necessary
to provide guidance to the pattern matching algorithm when it would otherwise
make bad choices about which rules to try first. You generally should not
give any rule a priority within this region unless you know exactly what the
implications are with respect to how the llvm backend orders matches.
strict
and seqstrict
attributesThe strictness attributes allow defining evaluation strategies without having
to explicitly make rules which implement them. This is done by injecting
heating and cooling rules for the subterms. For this to work, you need to
define what a result is for K, by extending the KResult
sort.
For example:
syntax AExp ::= Int | AExp "+" AExp [strict, klabel(addExp)]
This generates two heating rules (where the hole syntaxes "[]" "+" AExp
and
AExp "+" "[]"
is automatically added to create an evaluation context):
rule [addExp1-heat]: <k> HOLE:AExp + AE2:AExp => HOLE ~> [] + AE2 ... </k> [heat] rule [addExp2-heat]: <k> AE1:AExp + HOLE:AExp => HOLE ~> AE1 + [] ... </k> [heat]
And two corresponding cooling rules:
rule [addExp1-cool]: <k> HOLE:AExp ~> [] + AE2 => HOLE + AE2 ... </k> [cool] rule [addExp2-cool]: <k> HOLE:AExp ~> AE1 + [] => AE1 + HOLE ... </k> [cool]
Note that the rules are given labels based on the klabel of the production, which
nonterminal is the hole, and whether it's the heating or the cooling rule.
You will note that these rules can apply one after another infinitely. In
practice, the KResult
sort is used to break this cycle by ensuring that only
terms that are not part of the KResult
sort will be heated. The heat
and
cool
attributes are used to tell the compiler that these are heating and
cooling rules and should be handled in the manner just described. Nothing stops
the user from writing such heating and cooling rules directly if they wish,
although we describe other more convenient syntax for most of the advanced
cases below.
One other thing to note is that in the above sentences, HOLE
is just a
variable, but it has special meaning in the context of sentences with the
heat
or cool
attribute. In heating or cooling rules, the variable named
HOLE
is considered to be the term being heated or cooled and the compiler
will generate isKResult(HOLE)
and notBool isKResult(HOLE)
side conditions
appropriately to ensure that the backend does not loop infinitely. The module
BOOL
will also be automatically and privately included for semantic
purposes. The syntax for parsing programs will not be affected.
In order for this functionality to work, you need to define the KResult
sort.
For instance, we tell K that a term is fully evaluated once it becomes an Int
here:
syntax KResult ::= Int
Note that you can also say that a given expression is only strict only in
specific argument positions. Here we use this to define "short-circuiting"
boolean operators.
syntax KResult ::= Bool syntax BExp ::= Bool | BExp "||" BExp [strict(1)] | BExp "&&" BExp [strict(1)] rule <k> true || _ => true ... </k> rule <k> false || REST => REST ... </k> rule <k> true && REST => REST ... </k> rule <k> false && _ => false ... </k>
If you want to force a specific evaluation order of the arguments, you can use
the variant seqstrict
to do so. For example, this would make the boolean
operators short-circuit in their second argument first:
syntax KResult ::= Bool syntax BExp ::= Bool | BExp "||" BExp [seqstrict(2,1)] | BExp "&&" BExp [seqstrict(2,1)] rule <k> _ || true => true ... </k> rule <k> REST || false => REST ... </k> rule <k> REST && true => REST ... </k> rule <k> _ && false => false ... </k>
This will generate rules like this in the case of _||_
(note that BE1
will
not be heated unless isKResult(BE2)
is true, meaning that BE2
must be
evaluated first):
rule <k> BE1:BExp || HOLE:BExp => HOLE ~> BE1 || [] ... </k> [heat] rule <k> HOLE:BExp || BE2:BExp => HOLE ~> [] || BE2 ... </k> requires isKResult(BE2) [heat] rule <k> HOLE:BExp ~> [] || BE2 => HOLE || BE2 ... </k> [cool] rule <k> HOLE:BExp ~> BE1 || [] => BE1 || HOLE ... </k> [cool]
Sometimes more advanced evaluation strategies are needed. By default, the
strict
and seqstrict
attributes are limited in that they cannot describe
the context in which heating or cooling should occur. When this type of
control over the evaluation strategy is required, context
sentences can be
used to simplify the process of declaring heating and cooling when it would be
unnecessarily verbose to write heating and cooling rules directly.
For example, if the user wants to heat a term if it exists under a foo
constructor if the term to be heated is of sort bar
, one might write the
following context (with the optional label):
context [foo]: foo(HOLE:Bar)
Once again, note that HOLE
is just a variable, but one that has special
meaning to the compiler indicating the position in the context that should
be heated or cooled.
This will automatically generate the following sentences:
rule [foo-heat]: <k> foo(HOLE:Bar) => HOLE ~> foo([]) ... </k> [heat] rule [foo-cool]: <k> HOLE:Bar ~> foo([]) => foo(HOLE) ... </k> [cool]
The user may also write the K cell explicitly in the context declaration
if they want to match on another cell as well, for example:
context <k> foo(HOLE:Bar) ... </k> <state> .Map </state>
This context will now only heat or cool if the state
cell is empty.
The user is allowed to write a side condition in a context declaration, like
so:
context foo(HOLE:Bar) requires baz(HOLE)
This side condition will be appended verbatim to the heating rule that is
generated, however, it will not affect the cooling rule that is generated:
rule <k> foo(HOLE:Bar) => HOLE ~> foo([]) ... </k> requires baz(HOLE) [heat] rule <k> HOLE:Bar ~> foo([]) => foo(HOLE) ... </k> [cool]
The user can also include exactly one rewrite operation in a context
declaration if that rule rewrites the variable HOLE
on the left hand side
to a term containing HOLE
on the right hand side. For exampl;e:
context foo(HOLE:Bar => bar(HOLE))
In this case, the code generated will be as follows:
rule <k> foo(HOLE:Bar) => bar(HOLE) ~> foo([]) ... </k> [heat] rule <k> bar(HOLE:Bar) ~> foo([]) => foo(HOLE) ... </k> [cool]
This can be useful if the user wishes to evaluate a term using a different
set of rules than normal.
result
attributeSometimes it is necessary to be able to evaluate a term to a different sort
than KResult
. This is done by means of adding the result
attribute to
a strict production, a context, or an explicit heating or cooling rule:
syntax BExp ::= Bool | BExp "||" BExp [seqstrict(2,1), result(Bool)]
In this case, the sort check used by seqstrict
and by the heat
and cool
attributes will be isBool
instead of isKResult
. This particular example
does not really require use of the result
attribute, but if the user wishes
to evaluate a term of sort KResult further, the result attribute would be
required.
hybrid
attributeIn certain situations, it is desirable to treat a particular production which
has the strict
attribute as a result if the term has had its arguments fully
evaluated. This can be accomplished by means of the hybrid
attribute:
syntax KResult ::= Bool syntax BExp ::= Bool | BExp "||" BExp [strict(1), hybrid]
This attribute is equivalent in this case to the following additional axiom
being added to the definition of isKResult
:
rule isKResult(BE1:BExp || BE2:BExp) => true requires isKResult(BE1)
Sometimes you wish to declare a production hybrid with respect to a predicate
other than isKResult
. You can do this by specifying a sort as the body of the
hybrid
attribute, e.g.:
syntax BExp ::= BExp "||" BExp [strict(1), hybrid(Foo)]
generates the rule:
rule isFoo(BE1:BExp || BE2:BExp) => true requires isFoo(BE1)
Properly speaking, hybrid
takes an optional comma-separated list of sort
names. If the list is empty, the attribute is equivalent to hybrid(KResult)
.
Otherwise, it generates hybrid predicates for exactly the sorts named.
Sometimes it is necessary to define a fairly complicated evaluation strategy
for a lot of different operators. In this case, the user could simply write
a number of complex context
declarations, however, this quickly becomes
tedious. For this purpose, K has a concept called a context alias. A context
alias is a bit like a template for describing contexts. The template can then
be instantiated against particular productions using the strict
and
seqstrict
attributes.
Here is a (simplified) example taken from the K semantics of C++:
context alias [c]: <k> HERE:K ... </k> <evaluate> false </evaluate> context alias [c]: <k> HERE:K ... </k> <evaluate> true </evaluate> [result(ExecResult)] syntax Expr ::= Expr "=" Init [strict(c; 1)]
This defines the evaluation strategy during the translation phase of a C++
program for the assignment operator. It is equivalent to writing the following
context declarations:
context <k> HOLE:Expr = I:Init ... </k> <evaluate> false </evaluate> context <k> HOLE:Expr = I:Init ... </k> <evaluate> true </evaluate> [result(ExecResult)]
What this is saying is, if the evaluate
cell is false, evaluate the term
like normal to a KResult
. But if the evaluate
cell is true, instead
evaluate it to the ExecResult
sort.
Essentially, we have given a name to this evaluation strategy in the form of
the rule label on the context alias sentences (in this case, c
). We can
then say that we want to use this evaluation strategy to evaluate particular
arguments of particular productions by referring to it by name in a strict
attribute. For example, strict(c)
will instantiate these contexts once for
each argument of the production, whereas strict(c; 1)
will instantiate it
only for the first argument. The special variable HERE
is used to tell the
compiler where you want to place the production that is to be heated or cooled.
You can also specify multiple context aliases for different parts of a production,
for example:
syntax Exp ::= foo(Exp, Exp) [strict(left; 1; right; 2)]
This says that we can evaluate the left and right arguments in either order, but to evaluate
the left using the left
context alias and the right using the right
context alias.
We can also say seqstrict(left; 1; right; 2)
, in which case we additionally must evaluate
the left argument before the right argument. Note, all strict positions are considered collectively
when determining the evaluation order of seqstrict
or the hybrid
predicates.
A strict
attribute with no rule label associated with it is equivalent to
a strict
attribute given with the following context alias:
context alias [default]: <k> HERE:K ... </k>
One syntactic convenience that is provided is that if you wish to declare the following context:
context foo(HOLE => bar(HOLE))
you can simply write the following:
syntax Foo ::= foo(Bar) [strict(alias)] context alias [alias]: HERE [context(bar)]
New syntax has been added to K for matching a pattern and binding the resulting
match in its entirety to a variable.
The syntax is:
Pattern #as V::Var
In this case, Pattern, including any variables, is matched and the resulting
variables are added to the substitution if matching succeeds. Furthermore, the
term matched by Pattern is added to the substitution as V.
This code can also be used outside of any rewrite, in which case matching
occurs as if it appeared on the left hand side, and the right hand side becomes
a variable corresponding to the alias.
It is an error to use an as pattern on the right hand side of a rule.
We have added a syntax for matching on KApply terms which mimics the record
syntax in functional languages. This allows us to more easily express patterns
involving a KApply term in which we don't care about some or most of the
children, without introducing a dependency into the code on the number of
arguments which could be changed by a future refactoring.
The syntax is:
record(... field1: Pattern1, field2: Pattern2)
Note that this only applies to productions that are prefix productions.
A prefix production is considered by the implementation to be any production
whose production items match the following regular expression:
(Terminal(_)*) Terminal("(")
(NonTerminal (Terminal(",") NonTerminal)* )?
Terminal(")")
In other words, any sequence of terminals followed by an open parenthesis, an
optional comma separated list of non-terminals, and a close parenthesis.
If a prefix production has no named nonterminals, a record(...)
syntax is
allowed, but in order to reference specific fields, it is necessary to give one
or more of the non-terminals in the production names.
Note: because the implementation currently creates one production per possible
set of fields to match on, and because all possible permutations of all
possible subsets of a list of n elements is a number that scales factorially
and reaches over 100 thousand productions at n=8, we currently do not allow
fields to be matched in any order like a true record, but only in the same
order as appears in the production itself.
Given that this only reduces the number of productions to the size of the power
set, this will still explode the parsing time if we create large productions of
10 or more fields that all have names. This is something that should probably
be improved, however, productions with that large of an arity are rare, and
thus it has not been viewed as a priority.
Sometimes you wish to express that a rule should match if one out of multiple
patterns should match the same subterm. We can now express this in K by means
of using the #Or
ML connective on the left hand side of a rule.
For example:
rule foo #Or bar #Or baz => qux
Here any of foo, bar, or baz will match this rule. Note that the behavior is
ill-defined if it is not the case that all the clauses of the or have the same
bound variables.
On occasion it is highly desirable to be able to look up information from the
global configuration and match against it when evaluating a function. For this
purpose, we introduce a new syntax for function rules.
This syntax allows the user to match on function context from within a
function rule:
syntax Int ::= foo(Int) [function] rule [[ foo(0) => I ]] <bar> I </bar> rule something => foo(0)
This is completely desugared by the K frontend and does not require any special
support in the backend. It is an error to have a rewrite inside function
context, as we do not currently support propagating such changes back into the
global configuration. It is also an error if the context is not at the top
level of a rule body.
Desugared code:
syntax Int ::= foo(Int, GeneratedTopCell) [function] rule foo(0, <generatedTop> <bar> I </bar> ... </generatedTop> #as Configuration) => I rule <generatedTop> <k> something ... </k> ... </generatedTop> #as Configuration => <generatedTop> <k> foo(0, Configuration> ... </k> ... </generatedTop>
It is allowed to write patterns on the left hand side of rules which refer to
complex terms of sort Map, List, and Set, despite these patterns ostensibly
breaking the rule that terms which are functions should not appear on the left
hand side of rules. Such terms are destructured into pattern matching
operations.
The following forms are allowed:
// 0 or more elements followed by 0 or 1 variables of sort List followed by
// 0 or more elements
ListItem(E1) ListItem(E2) L:List ListItem(E3) ListItem(E4)
// the empty list
.List
// 1 or more list update operations applied to a variable
L:List [ K1 <- E1 ] [ K2 <- E2 ]
// 0 or more elements in any order plus 0 or 1 variables of sort Set
// in any order
SetItem(K1) SetItem(K2) S::Set SetItem(K3) SetItem(K4)
// the empty set
.Set
// 0 or more elements in any order plus by 0 or 1 variables of sort Map
// in any order
K1 |-> E1 K2 |-> E2 M::Map K3 |-> E3 K4 |-> E4
// the empty map
.Map
Here K1, K2, K3, K4 etc can be any pattern except a pattern containing both
function symbols and unbound variables. An unbound variable is a variable whose
binding cannot be determined by means of decomposing non-set-or-map patterns or
map elements whose keys contain no unbound variables.
This is determined recursively, ie, the term K1 |-> E2 E2 |-> E3 E3 |-> E4
is
considered to contain no unbound variables.
Note that in the pattern K1 |-> E2 K3 |-> E4 E4 |-> E5
, K1 and K3 are
unbound, but E4 is bound because it is bound by deconstructing the key E3, even
though E3 is itself unbound.
In the above examples, E1, E2, E3, and E4 can be any pattern that is normally
allowed on the lhs of a rule.
When a map, set, or list key contains function symbols, we know that the
variables in that key are bound (because of the above restriction), so it is
possible to evaluate the function to a concrete term prior to performing the
lookup.
Indeed, this is the precise semantics which occurs; the function is evaluated
and the result is looked up in the collection.
For example:
syntax Int ::= f(Int) [function] rule f(I:Int) => I +Int 1 rule <k> I:Int => . ... </k> <state> ... SetItem(f(I)) ... </state>
This will rewrite I
to .
if and only if the state cell contains
I +Int 1
.
Note that in the case of Set and Map, one guarantee is that K1, K2, K3, and K4
represent /distinct/ elements. Pattern matching fails if the correct number of
distinct elements cannot be found.
K allows matching fragments of the configuration and using them to construct
terms and use as function parameters.
configuration <t> <k> #init ~> #collectOdd ~> $PGM </k> <fs> <f multiplicity="*" type="Set"> 1 </f> </fs> </t>
The #collectOdd
construct grabs the entire content of the <fs>
cell.
We may also match on only a portion of its content. Note that the fragment
must be wrapped in a <f>
cell at the call site.
syntax KItem ::= "#collectOdd" rule <k> #collectOdd => collectOdd(<fs> Fs </fs>) ... </k> <fs> Fs </fs>
The collectOdd
function collects the items it needs
syntax Set ::= collectOdd(FsCell) [function] rule collectOdd(<fs> <f> I </f> REST </fs>) => SetItem(I) collectOdd(<fs> REST </fs>) requires I %Int 2 ==Int 1 rule collectOdd(<fs> <f> I </f> REST </fs>) => collectOdd(<fs> REST </fs>) requires I %Int 2 ==Int 0 rule collectOdd(<fs> .Bag </fs>) => .Set
all-path
and one-path
attributes to distinguish reachability claimsAs the Haskell backend can handle both one-path and all-path reachability
claims, but both these are encoded as rewrite rules in K, these attributes can
be used to clarify what kind of claim a rule is.
In addition of being able to annotate a rule with one of them
(if annotating with more at the same time, only one of them would be chosen),
one can also annotate whole modules, to give a default claim type for all rules
in that module.
Additionally, the Haskell backend introduces an extra command line option
for the K frontend, --default-claim-type
, with possible values
all-path
and one-path
to allow choosing a default type for all
claims.
Set variables were introduced as part of Matching Mu Logic, the mathematical
foundations for K. In Matching Mu Logic, terms evaluate to sets of values.
This is useful for both capturing partiality (as in 3/0
) and capturing
non-determinism (as in 3 #Or 5
). Consequently, symbol interpretation is
extended to have a collective interpretation over sets of input values.
Usually, K rules are given using regular variables, which expect that the term
they match is both defined and has a unique interpretation.
However, it is sometimes useful to have simplification rules which work over
any kind of pattern, be it undefined or non-deterministic. This behavior can be
achieved by using set variables to stand for any kind of pattern.
Any variable prefixed by @
will be considered a set variable.
Below is a simplification rule which motivated this extension:
rule #Ceil(@I1:Int /Int @I2:Int) =>
{(@I2 =/=Int 0) #Equals true} #And #Ceil(@I1) #And #Ceil(@I2)
[anywhere]
This rule basically says that @I1:Int /Int @I2:Int
is defined if @I1
and
@I2
are defined and @I2
is not 0. Using sets variables here is important as
it allows the simplification rule to apply any symbolic patterns, without
caring whether they are defined or not.
This allows simplifying the expression #Ceil((A:Int /Int B:Int) / C:Int)
to:
{(C =/=Int 0) #Equals true} #And #Ceil(C) #And ({(B =/=Int 0) #Equals true}
#And #Ceil(B) #And #Ceil(A)`
See kframework/kore#729 for
more details.
K makes queries to an SMT solver (Z3) to discharge proof obligations when doing
symbolic execution. You can control how these queries are made using the
attributes smtlib
, smt-hook
, and smt-lemma
on declared productions.
These attributes guide the prover when it tries to apply rules to discharge a
proof obligation.
smt-hook(...)
allows you to specify a term in SMTLIB2 format which shouldsmtlib(...)
allows you to declare a new SMT symbol to be used when thatsmt-lemma
can be applied to a rule to encode it as a conditional equalityrule LHS => RHS requires REQ
will be(=> REQ (= (LHS RHS))
. Every symbolsmt-hook(...)
or smtlib(...)
attribute.syntax Int ::= "~Int" Int [function, klabel(~Int_), symbol, smtlib(notInt)] | Int "^%Int" Int Int [function, klabel(_^%Int__), symbol, smt-hook((mod (^ #1 #2) #3))]
In the example above, we declare two productions ~Int_
and _^%Int__
, and
tell the SMT solver to:
~Int_
via SMTLIB2 symbolnotInt
, and(mod (^ #1 #2) #3)
(where #N
marks the N
th_^%Int__
, where mod
and^
already are declared by the SMT solver.Set variables are currently only supported by the Haskell backend.
The use of rules with set variables should be sound for all other backends
which just execute by rewriting, however it might not be safe for backends
which want to guarantee coverage.
This section presents possible scenarios requiring variables to only appear in
the RHS of a rule.
Except for ?
variables and !
(fresh) variables, which are
required to only appear in the RHS of a rule, all other variables must
also appear in the LHS of a rule. This restriction also applies to anonymous
variables; in particular, for claims, ?_
(not _
) should be used in the RHS
to indicate that something changes but we don't care to what value.
To support specifying random-like behavior, the above restriction can be relaxed
by annotating a rule with the unboundVariables
attribute whenever the rule
intentionally contains regular variables only occurring in the RHS.
K uses question mark variables of the form ?X
to refer to
existential variables, and uses ensures
to specify logical constraints on
those variables.
These variables are only allowed to appear in the RHS of a K rule.
If the rules represent rewrite (semantic) steps or verification claims,
then the ?
variables are existentially quantified at the top of the RHS;
otherwise, if they represent equations, the ?
variables are quantified at the
top of the entire rule.
Note that when both ?
-variables and regular variables are present,
regular variables are (implicitly) universally quantified on top of the rule
(already containing the existential quantifications).
This essentially makes all ?
variables depend on all regular variables.
All examples below are intended more for program verification /
symbolic execution, and thus concrete implementations might choose to ignore
them altogether or to provide ad-hoc implementations for them.
Consider the following definition of a (transition) system:
module A rule foo => true rule bar => true rule bar => false endmodule
Consider also, the following specification of claims about the definition above:
module A-SPEC rule [s1]: foo => ?X:Bool rule [s2]: foo => X:Bool [unboundVariables(X)] rule [s3]: bar => ?X:Bool rule [s4]: bar => X:Bool [unboundVariables(X)] endmodule
foo
to some boolean, which isfoo => true
rulebar
and can be satisfied by either ofbar => true
and bar => false
rulesBool
, which can be interpreted thattrue
and false
are reachable from foo
for (s2), or bar
for (s4),foo
to false
.bar => true
to show true
is reachable andbar => false
to achieve the same thing for false
(s1) says that all paths from foo
will reach some boolean, which is
satisfied by the foo => true
rule and the lack of other rules for foo
(s3) says the same thing about bar
and can be satisfied by checking that
both bar => true
and bar => false
end in a boolean, and there are no
other rules for bar
(s2) and (s4) can be better understood by replacing them with instances for
each element of type Bool
, which can be interpreted that
both true
and false
are reachable in all paths originating in
foo
for (s2), or bar
for (s4), respectively.
This is a very strong claim, requiring that all paths originating in
foo
(bar
) pass through both true
and false
,
so neither (s2) nor (s4) can be verified.
Interestingly enough, adding a rule like false => true
would make both
(s2) and (s4) hold.
rand()
The random number construct rand()
is a language construct which could be
easily conceived to be part of the syntax of a programming language:
Exp ::= "rand" "(" ")"
The intended semantics of rand()
is that it can rewrite to any integer in
a single step. This could be expressed as the following following infinitely
many rules.
rule rand() => 0 rule rand() => 1 rule rand() => 2 ... ... rule rand() => (-1) rule rand() => (-2) ... ...
Since we need an instance of the rule for every integer, one could summarize
the above infinitely many rules with the rule
rule rand() => I:Int [unboundVariables(I)]
Note that I
occurs only in the RHS in the rule above, and thus the rule
needs the unboundVariables(I)
attribute to signal that this is intentionally.
One can define variants of rand()
by further constraining the output variable
as a precondition to the rule.
randBounded(M,N)
can rewrite to any integer between M
and N
syntax Exp ::= randBounded(Int, Int) rule randBounded(M, N) => I requires M <=Int I andBool I <=Int N [unboundVariables(I)]
randInList(Is)
takes a list Is
of items
and can rewrite in one step to any item in Is
.
syntax Exp ::= randInList (List) rule randInList(Is) => I requires I inList Is [unboundVariables(I)]
randNotInList(Is)
takes a list Is
of items
and can rewrite in one step to any item not in Is
.
syntax Exp ::= randNotInList (List) rule randNotInList(Is) => I requires notBool(I inList Is) [unboundVariables(I)]
randPrime()
, can rewrite to any prime number.
syntax Exp ::= randPrime () rule randPrime() => X:Int requires isPrime(X) [unboundVariables(X)]
where isPrime(_)
is a predicate that can be defined in the usual way.
Note 1: all above are not function symbols, but language constructs.
Note 2: Currently the frontend does not allow rules with universally quantified
variables in the RHS which are not bound in the LHS.
Note 3. Allowing these rules in a concrete execution engine would require an
algorithm for generating concrete instances for such variables, satisfying the
given constraints; thus the unboundVariables
attribute serves two purposes:
fresh(Is)
The fresh integer construct fresh(Is)
is a language construct.
Exp ::= ... | "fresh" "(" List{Int} ")"
The intended semantics of fresh(Is)
is that it can always rewrite to an
integer that in not in Is
.
Note that fresh(Is)
and randNotInList(Is)
are different; the former
does not need to be able to rewrite to every integers not in Is
,
while the latter requires so.
For example, it is correct to implement fresh(Is)
so it always returns the
smallest positive integer that is not in Is
, but same implementation for
randNotInList(Is)
might be considered inadequate.
In other words, there exist multiple correct implementations of fresh(Is)
,
some of which may be deterministic, but there only exists a unique
implementation of randNotInList(Is)
.
Finally, note that randNotInList(Is)
is a correct implementation
for fresh(Is)
; Hence, concrete execution engines can choose to handle
such rules accordingly.
We use the following K syntax to define fresh(Is)
syntax Exp ::= fresh (List{Int}) rule fresh(Is:List{Int}) => ?I:Int ensures notBool (?I inList{Int} Is)
A variant of this would be a choiceInList(Is)
language construct which would
choose some number from a list:
syntax Exp ::= choiceInList (List{Int}) rule choiceInList(Is:List{Int}) => ?I:Int ensures ?I inList{Int} Is
Note: This definition is different from one using a !
variable to indicate
freshness because using !
is just syntactic sugar for generating globally
unique instances and relies on a special configuration cell, and cannot be
constrained, while the fresh
described here is local and can be constrained.
While the first is more appropriate for concrete execution, this might be
better for symbolic execution / program verification.
arb()
The function arb()
is not a PL construct, but a mathematical function.
Therefore, its definition should not be interpreted as an execution step, but
rather as an equality.
The intended semantics of arb()
is that it is an unspecified nullary function.
The exact return value of arb()
is unspecified in the semantics but up to the
implementations.
However, being a mathematical function, arb()
must return the same value in
any one implementation.
We do not need special frontend syntax to define arb()
.
We only need to define it in the usual way as a function
(instead of a language construct), and provide no axioms for it.
The total
attribute ensures that the function is total, i.e.,
that it evaluates to precisely one value for each input.
There are many variants of arb()
. For example, arbInList(Is)
is
an unspecified function whose return value must be an element from Is
.
Note that arbInList(Is)
is different from choiceInList(Is)
, because
choiceInList(Is)
transitions to an integer in Is
(could be a different one
each time it is used), while arbInList(Is)
is equal to a (fixed)
integer not in Is
.
W.r.t. the arb
variants, we can use ?
variables and the function
annotation to signal that we're defining a function and the value of the
function is fixed, but non-determinate.
syntax Int ::= arbInList(List{Int}) [function] rule arbInList(Is:List{Int}) => ?I:Int ensures ?I inList{Int} Is
If elimination of existentials in equational rules is needed, one possible
approach would be through Skolemization,
i.e., replacing the ?
variable with a new uninterpreted function depending
on the regular variables present in the function.
interval()
The symbol interval(M,N)
is not a PL construct, nor a function in the
first-order sense, but a proper matching-logic symbol, whose interpretation is
in the powerset of its domain.
Its axioms will not use rewrites but equalities.
The intended semantics of interval(M,N)
is that it equals the set of
integers that are larger than or equal to M
and smaller than or equal to N
.
Since expressing the axiom for interval
requires an an existential
quantification on the right-hand-side, thus making it a non-total symbol
defined through an equation, using ?
variables might be confusing since their
usage would be different from that presented in the previous sections.
Hence, the proposal to support this would be to write this as a proper ML rule.
A possible syntax for this purpose would be:
eq interval(M,N)
==
#Exists X:Int .
(X:Int #And { X >=Int M #Equals true } #And { X <=Int N #Equals true })
Additionally, the symbol declaration would require a special attribute to
signal the fact that it is not a constructor but a defined symbol.
Since this feature is not clearly needed by K users at the moment, it is only
presented here as an example; its implementation will be postponed for such time
when its usefulness becomes apparent.
In addition to on-the-fly parser generation using kast
, K is capable of
ahead-of-time parser generation of LR(1) or GLR parsers using Flex and Bison.
This can be done one of two different ways.
kast --gen-parser <outputFile>
orkast --gen-glr-parser <outputFile>
respectively. kast
will then create a-s
to specify the starting sort, and -m
to specify the module to$PGM
--gen-bison-parser
or --gen-glr-bison-parser
flags to kompile
.kompile
will decide the sorts to use as start symbols based on the sorts$PGM
<cell> foo($FOO:Foo, $BAR:Bar) </cell>
,parser="FOO, TEST; BAR, TEST2"
$FOO
should be parsed in theTEST
module, and configuration variable $BAR
should be parsed in theTEST2
module. If the user forgets to annotate the declaration with the$PGM
parser will be generated.Bison-generated parsers are extremely fast compared to kast
, but they have
some important limitations:
llvm-krun
or kore-exec
and bypass the krun
frontend, makingalias
attribute.kast
would (kast
is a GLL parser, ie, itnot-lr1
, which can be applied to modules known toprefer
and avoid
attributes are--bison-lists
to kompile. This will disable support for the List{Sort}
NeList{Sort}
left associative, but theNeList{Sort}
will be LR(1) and use boundedamb
production that is parametric inK-AMBIGUITIES
moduleK is able to insert file, line, and column metadata into the parse tree on a
per-sort basis when parsing using a bison-generated parser. To enable this,
mark the sort with the locations
attribute.
syntax Exp [locations] syntax Exp ::= Exp "/" Exp | Int
K implicitly wraps productions of these sorts in a #location
term (see the
K-LOCATIONS
module in kast.md
). The metadata can thus be accessed with
ordinary rewrite rules:
rule #location(_ / 0, File, StartLine, _StartColumn, _EndLine, _EndColumn) => "Error: Division by zero at " +String File +String ":" Int2String(StartLine)
Sometimes it is desirable to allow code to be written in a file which
overwrites the current location information provided by the parser. This can be
done via a combination of the #LineMarker
sort and the --bison-file
flag to
the parser generator. If you declare a production of sort #LineMarker
which
contains a regular expression terminal, this will be treated as a
line marker by the bison parser. The user will then be expected to provide
an implementation of the parser for the line marker in C. The function expected
by the parser has the signature void line_marker(char *, yyscan_t)
, where
yyscan_t
is a
reentrant flex scanner.
The string value of the line marker token as specified by your regular
expression can be found in the first parameter of the function, and you can
set the line number used by the scanner using yyset_lineno(int, yyscan_t)
. If
you declare the variable extern char *filename
, you can also set the current
file name by writing a malloc'd, zero-terminated string to that variable.
A number of factors go into how terms are unparsed in K. Here we describe some
of the features the user can use to control how unparsing happens.
One of the phases that the unparser goes through is to insert productions
tagged with the bracket
attribute where it believes this is necessary
in order to create a correct string that will be parsed back into the original
AST. The most common case of this is in expression grammars. For example,
consider the following grammar:
syntax Exp ::= Int | Exp "*" Exp > Exp "+" Exp
Here we have declared that expressions can contain integer addition and
multiplication, and that multiplication binds tighter than addition. As a
result, when writing a program, if we want to write an expression that first
applies addition, then multiplication, we must use brackets: (1 + 2) * 3
.
Similarly, if we have such an AST, we must insert brackets into the AST
in order to faithfully unparse the term in a manner that will be parsed back
into the same ast, because if we do not, we end up unparsing the term as
1 + 2 * 3
, which will be parsed back as 1 + (2 * 3)
because of the priority
declaration in the grammar.
You can control how the unparser will insert such brackets by adding a
production with the bracket
attribute and the correct sort. For example, if,
instead of parentheses, you want to use curly braces, you could write:
syntax Exp ::= "{" Exp "}" [bracket]
This would signal to the unparser how brackets should look for terms of sort
Exp
, and it will use this syntax when unparsing terms of sort Exp
.
One thing that K will do (unless you pass the --no-sort-collections
flag to
krun) is to sort associative, commutative collections (such as Set
and Map
)
alphanumerically. For example, if I have a collection whose keys are sort Id
and they have the values a, b, c, and d, then unparsing will always print
first the key a, then b, then c, then d, because this is the alphabetic order
of these keys when unparsed.
Furthermore, K will sort numeric keys numerically. For example, if I have a
collection whose keys are 1, 2, 5, 10, 30
, it will first display 1, then 2,
then 5, then 10, then 30, because it will sort these keys numerically. Note
that this is different than an alphabetic sort, which would sort them as
1, 10, 2, 30, 5
. We believe the former is more intuitive to users.
K will remove substitution terms corresponding to anonymous variables when
using the --pattern
flag if those anonymous variables provide no information
about the named variables in your serach pattern. You can disable this behavior
by passing --no-substitution-filtering
to krun. When this flag is not passed,
and you are using the Haskell backend, any equality in a substitution (ie, an
#Equals
under an #And
under an #Or
), will be hidden from the user if the
left hand side is a variable that was anonymous in the --pattern
passed by
the user, unless that variable appears elsewhere in the substitution. If you
want to see that variable in the substitution, you can either disable this
filtering, or give that variable a name in the original search pattern.
K will automatically rename variables that appear in the output configuration.
Similar to commutative collections, this is done to normalize the resulting
configuration so that equivalent configurations will be printed identically
regardless of how they happen to be reached. This pass can be disabled by
passing --no-alpha-renaming
to krun.
K will apply macros in reverse on the output configuration if the macro was
created with the alias
or alias-rec
attribute. See the section on macro
expansion for more details.
format
attributeK allows you to control how terms are unparsed using the format
attribute.
By default, a domain value is unparsed by printing its string value verbatim,
and an application pattern is unparsed by printing its terminals and children
in the sequence implied by its concrete syntax, separated by spaces. However,
K gives you complete control over how you want to unparse the symbol.
A format attribute is a string containing zero or more escape sequences that
tell K how to unparse the symbol. Escape sequences begin with a '%' and are
followed by either an integer, or a single non-digit character. Below is a
list of escape sequences recognized by the formatter:
Escape Sequence | Meaning |
---|---|
n | Insert '\n' followed by the current indentation level |
i | Increase the current indentation level by 1 |
d | Decrease the current indentation level by 1 |
c | Move to the next color in the list of colors for this production |
r | Reset color to the default foreground color for the terminal (See below for more information on how colors work) |
an integer | Print a terminal or nonterminal from the production (See below for more information) |
any other char | Print that character verbatim |
In the integer escape sequence %a
, the integer a
is treated as a 1-based
index into the terminals and nonterminals of the production.
If the offset refers to a terminal, move to the next color in the list of
colors for this production, print the value of that terminal, then reset the
color to the default foreground color for the terminal.
If the offset refers to a regular expression terminal, it is an error.
If the offset refers to a nonterminal, print the unparsed representation of
the corresponding child of the current term.
color
and colors
attributesK allows you to take advantage of ANSI terminal codes for foreground color
in order to colorize output pretty-printed by the unparser. This is controlled
via the color
and colors
attributes of productions. These attributes
combine with the format
attribute to control how a term is colorized.
The first thing to understand about how colorization works is that the color
and colors
attributes are used to construct a list of colors associated
with each production, and the format attribute then uses that list to choose
the color for each part of the production. For more information on how the
format attribute chooses a color from the list, see above, but essentially,
each terminal or %c
in the format attribute advances the pointer in the list
by one element, and terminals and %r
reset the current color to the default
foreground color of the terminal afterwards.
There are two ways you can construct a list of colors associated with a
production:
The color
attribute creates the entire list all with the same color, as
specified by the value of the attribute. When combined with the default format
attribute, this will color all the terminals in that production that color, but
more advanced techniques can be used as well.
The colors
attribute creates the list from a manual, comma-separated list
of colors. The attribute is invalid if the length of the list is not equal to
the number of terminals in the production plus the number of %c
substrings in
the format
attribute.
In K, many different syntactic categories accept an optional trailing list of
keywords known as attributes. Attribute lists have two different syntaxes,
depending on where they occur. Each attribute also has a type which describes
where it may occur.
The first syntax is a square-bracketed ([]
) list of words. This syntax is
available for following attribute types:
module
attributes - may appear immediately after the module
keywordsort
attributes - may appear immediately after a sort declarationproduction
attributes - may appear immediately after a BNF productionrule
attributes - may appear immediately after a rulecontext
attributes - may appear immediately after a context or contextcontext alias
attributes - may appear immediately after a context aliasclaim
attributes - may appear immediately after a claimThe second syntax is the XML attribute syntax, i.e., a space delemited list of
key-and-quoted-value pairs appearing inside the start tag of an XML element:
<element key1="value" key2="value2" ... > </element>
. This syntax is
available for the following attribute types:
cell
attributes - may appear inside of the cell start tag inUnrecognized attributes are reported as an error. When we talk about
the type of an attribute, we mean a syntactic category to which an attribute
can be attached where the attribute has some semantic effect.
We now provide an index of available attributes organized alphabetically with a
brief description of each. Note that the same attribute may appear in the index
multiple times to indicate its effect in different contexts or with/without
arguments. A legend describing how to interpret the index follows.
Some attributes should not generally appear in user code, except in some
unusual or complex examples. Such attributes are typically generated by the
compiler and used internally. We list these attributes below as a reference for
interested readers:
Name | Type | Backend | Reference |
---|---|---|---|
assoc |
prod | all | assoc , comm , idem and unit attributes |
comm |
prod | all | assoc , comm , idem and unit attributes |
digest |
mod | all | Contains the hash of the textual contents of the module. |
idem |
prod | all | assoc , comm , idem and unit attributes |
unit |
prod | all | assoc , comm , idem and unit attributes |
userList |
prod | all | Identifies the desugared form of Lst ::= List{Elm,"delim"} |
predicate |
prod | all | Specifies the sort of a predicate label |
element |
prod | all | Specifies the label of the elements in a list |
bracketLabel |
prod | all | Keep track of the label of a bracket production since it can't have a klabel |
injective |
prod | all | Label a given production as injective (unique output for each input) |
internal |
prod | all | Production is reserved for internal use by the compiler |
cool |
rule | all | strict and seqstrict attributes |
heat |
rule | all | strict and seqstrict attributes |
Name
- the attribute's name (optionally followed by an underscore _
to indicate the attribute takes arguments)
Type
- the syntactic categories where this attribute is not ignored;
the possible values are the types mentioned above or shorthands:
all
- short for any type except cell
mod
- short for module
sort
prod
- short for production
rule
ctxt
- short for context
or context alias
claim
cell
Backend
- the backends that do not ignore this attribute; possible values:
all
- all backendsllvm
- the LLVM backendhaskell
- the Haskell backendEffect
- the attribute's effect (when it applies)
Backend features not yet given documentation:
To get a complete list of hooks supported by K, you can run:
grep -P -R "(?<=[^-])hook\([^)]*\)" k-distribution/include/kframework/builtin/ \
--include "*.k" -ho | \
sed 's/hook(//' | sed 's/)//' | sort | uniq | grep -v org.kframework
All of these hooks will also eventually need documentation.
Except for in a very limited number of special cases from the
K standard library. ↩︎
The Maude documentation
has an example in a context that's somewhat similar to K; discussion of
ad-hoc overloading is not relevant. ↩︎
This is a quick reference of the most commonly used K tools.
kompile (--gen-bison-parser)? {file} : generate parser, optionally with ahead of time
krun {file} : interpret file
krun -cPGM='{string}' : interpret string
kast --output (kore | kast) (-e|{file}) : parse expression or file
kompile (--enable-search --backend haskell)? {file} : generate parser, enabling non-deterministic run
krun (--search-all)? {file} : interpret file, evaluating non-deterministic runs as well
foo-kompiled/parser_PGM {file} : ahead of time parse
kompile (--main-module)? (--syntax-module)? {file} : generate parser for {file}.k {file}-syntax.k, explicitly state main modules
kparse <file> | kore-print - : parse and unparse a file
kompile {file} --enable-llvm-debug : generate debuggable output for {file}.k
krun {file} --debugger : debug K code
kprove {file} : Verify specs in {file}
During GDB debugging session (see here for
LLDB breakpoint syntax):
break {file}:{linenum} : add a breakpoint to {file}'s {linenum} numbered line
k match {module}.{label} subject : investigate matching
Here we document how to use some of the most commonly used K tools.
When one is working with kore-repl
or the prover in general and looking at
specific configurations using config, sometimes the configurations can be huge.
One tool to help print configuration compactly is the pyk print
utility:
pyk print
We are going to use --minimize
option (which is actually used automatically
when printing with pyk). This will filter out many uninteresting cells for the
current config and make the result more compact.
Then, when invoking the prover, you can minimize your output by piping it into
the pyk print ...
facility with arguments for controlling the output:
kprove --output json --definition DEFN ... \ | jq .term \ | pyk print DEFN /dev/stdin --omit-labels ... --keep-labels ...
You can also use this in the kore-repl
more easily, by making a help script.
In your current directory, save a new script pykprint.sh
:
#!/bin/bash kast --input kore --output json --definition $1 /dev/stdin \ | jq .term \ | pyk print $1 /dev/stdin --omit-labels $2
Now call config | bash pykprint.sh DEFN
in Kore REPL to make the output
smaller.
The options you have to control the output are as follows:
--no-minimize
: do not remove uninteresting cells.--omit-cells
: remove the selected cells from the output.--keep-cells
: keep only the selected cells in the output.Note: Make sure that there is no whitespace around , in the omit list,
otherwise you'll get an error (, is a list separator, so this
requirement is strict).
The LLVM Backend has support for integration with GDB. You can run the debugger
on a particular program by passing the --debugger
flag to krun, or by
invoking the llvm backend interpreter directly. Below we provide a simple
tutorial to explain some of the basic commands supported by the LLVM backend.
GDB is not well-supported on macOS, particularly on newer OS versions and Apple
Silicon ARM hardware. Consequently, if the --debugger
option is passed to krun
on macOS, LLDB[^1] is launched instead of GDB. However, the K-specific debugger
scripts that GDB uses have not been ported to LLDB yet, and so the instructions
in the rest of this section will not work.
Here is a sample K definition we will use to demonstrate debugging
capabilities:
module TEST imports INT configuration <k> foo(5) </k> rule [test]: I:Int => I +Int 1 requires I <Int 10 syntax Int ::= foo(Int) [function] rule foo(I) => 0 -Int I endmodule
You should compile this definition with --backend llvm --enable-llvm-debug
to
use the debugger most effectively.
Important: When you first run krun
with option --debugger
, GDB / LLDB
will instruct you on how to modify ~/.gdbinit
or ~/.lldbinit
to enable
printing abstract syntax of K terms in the debugger. If you do not perform this
step, you can still use all the other features, but K terms will be printed as
their raw address in memory.
GDB will need the kompiled interpreter in its safe path in order to access the
pretty printing python script within it. A good way to do this would be to pick
a minimum top-level path that covers all of your kompiled semantics (ie. set auto-load safe-path ~/k-semantics
). LLDB has slightly different security
policies that do not require fully-arbitrary code execution.
This section uses GDB syntax to demonstrate the debugging features. Please
refer to the GDB to LLDB command map on
macOS.
You can break before every step of execution is taken by setting a breakpoint
on the k_step
function.
(gdb) break definition.kore:k_step
Breakpoint 1 at 0x25e340
(gdb) run
Breakpoint 1, 0x000000000025e340 in step (subject=`<generatedTop>{}`(`<k>{}`(`kseq{}`(`inj{Int{}, KItem{}}`(#token("0", "Int")),dotk{}(.KList))),`<generatedCounter>{}`(#token("0", "Int"))))
(gdb) continue
Continuing.
Breakpoint 1, 0x000000000025e340 in step (subject=`<generatedTop>{}`(`<k>{}`(`kseq{}`(`inj{Int{}, KItem{}}`(#token("1", "Int")),dotk{}(.KList))),`<generatedCounter>{}`(#token("0", "Int"))))
(gdb) continue 2
Will ignore next crossing of breakpoint 1. Continuing.
Breakpoint 1, 0x000000000025e340 in step (subject=`<generatedTop>{}`(`<k>{}`(`kseq{}`(`inj{Int{}, KItem{}}`(#token("3", "Int")),dotk{}(.KList))),`<generatedCounter>{}`(#token("0", "Int"))))
(gdb)
You can break when a rule is applied by giving the rule a rule label. If the
module name is TEST and the rule label is test, you can break when the rule
applies by setting a breakpoint on the TEST.test.rhs
function:
(gdb) break TEST.test.rhs
Breakpoint 1 at 0x25e250: file /home/dwightguth/test/./test.k, line 4.
(gdb) run
Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4 rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)
Note that the substitution associated with that rule is visible in the
description of the frame.
You can also break when a side condition is applied using the TEST.test.sc
function:
(gdb) break TEST.test.sc
Breakpoint 1 at 0x25e230: file /home/dwightguth/test/./test.k, line 4.
(gdb) run
Breakpoint 1, TEST.test.sc (VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4 rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)
Note that every variable used in the side condition can have its value
inspected when stopped at this breakpoint, but other variables are not visible.
You can also break on a rule by its location:
(gdb) break test.k:4
Breakpoint 1 at 0x25e230: test.k:4. (2 locations)
(gdb) run
Breakpoint 1, TEST.test.sc (VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4 rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb) continue
Continuing.
Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("0", "Int")) at /home/dwightguth/test/./test.k:4
4 rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb) continue
Continuing.
Breakpoint 1, TEST.test.sc (VarI=#token("1", "Int")) at /home/dwightguth/test/./test.k:4
4 rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)
Note that this sets a breakpoint at two locations: one on the side condition
and one on the right hand side. If the rule had no side condition, the first
would not be set. You can also view the locations of the breakpoints and
disable them individually:
(gdb) info breakpoint
Num Type Disp Enb Address What
1 breakpoint keep y <MULTIPLE>
breakpoint already hit 3 times
1.1 y 0x000000000025e230 in TEST.test.sc at /home/dwightguth/test/./test.k:4
1.2 y 0x000000000025e250 in TEST.test.rhs at /home/dwightguth/test/./test.k:4
(gdb) disable 1.1
(gdb) continue
Continuing.
Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("1", "Int")) at /home/dwightguth/test/./test.k:4
4 rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb) continue
Continuing.
Breakpoint 1, TEST.test.rhs (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList), VarI=#token("2", "Int")) at /home/dwightguth/test/./test.k:4
4 rule [test]: I:Int => I +Int 1 requires I <Int 10
(gdb)
Now only the breakpoint when the rule applies is enabled.
You can also break when a particular function in your semantics is invoked:
(gdb) info functions foo
All functions matching regular expression "foo":
File /home/dwightguth/test/./test.k:
struct __mpz_struct *Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int(struct __mpz_struct *);
(gdb) break Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int
Breakpoint 1 at 0x25e640: file /home/dwightguth/test/./test.k, line 6.
(gdb) run
Breakpoint 1, Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int (_1=#token("1", "Int")) at /home/dwightguth/test/./test.k:6
6 syntax Int ::= foo(Int) [function]
(gdb)
In this case, the variables have numbers instead of names because the names of
arguments in functions in K come from rules, and we are stopped before any
specific rule has applied. For example, _1
is the first argument to the
function.
You can also set a breakpoint in this location by setting it on the line
associated with its production:
(gdb) break test.k:6
Breakpoint 1 at 0x25e640: file /home/dwightguth/test/./test.k, line 6.
(gdb) run
Breakpoint 1, Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int (_1=#token("1", "Int")) at /home/dwightguth/test/./test.k:6
6 syntax Int ::= foo(Int) [function]
These two syntaxes are equivalent; use whichever is easier for you.
You can also view the stack of function applications:
(gdb) bt
#0 Lblfoo'LParUndsRParUnds'TEST'UndsUnds'Int (_1=#token("1", "Int")) at /home/dwightguth/test/./test.k:6
#1 0x000000000025e5f8 in apply_rule_111 (VarDotVar0=`<generatedCounter>{}`(#token("0", "Int")), VarDotVar1=dotk{}(.KList)) at /home/dwightguth/test/./test.k:9
#2 0x0000000000268a52 in take_steps ()
#3 0x000000000026b7b4 in main ()
(gdb)
Here we see that foo
was invoked while applying the rule on line 9 of test.k,
and we also can see the substitution of that rule. If foo was evaluated while
evaluating another function, we would also be able to see the arguments of that
function as well, unless the function was tail recursive, in which case no
stack frame would exist once the tail call was performed.
Using rbreak <regex>
you can set breakpoints on multiple functions.
rbreak Lbl
- sets a breakpoint on all non hooked function
s
rbreak Lbl.*TEST
- sets a breakpoint on all function
s from module TEST
rbreak hook_INT
- sets a breakpoint on all hooks from module INT
<optimized out>
try kompiling without -O1
, -O2
, or -O3
.(gdb) break definition.kore:break -> No source file named definition.kore.
--enable-llvm-debug
to kompile in order to generate debug info symbols.The first thing to be aware of is in order to get meaningful data,
you need to build the semantics and all of its dependencies with
optimizations enabled but without the frame pointer elimination
optimization. For example, for EVM, this means rebuilding GMP, MPFR,
JEMalloc, Crypto++, SECP256K1, etc with the following exports
.
export CFLAGS="-DNDEBUG -O2 -fno-omit-frame-pointer" export CXXFLAGS="-DNDEBUG -O2 -fno-omit-frame-pointer"
You can skip this step, but if you do, any samples within these
libraries will not have correct stack trace information, which means
you will likely not get a meaningful set of data that will tell you
where the majority of time is really being spent. Don't worry about
rebuilding literally every single dependency though. Just focus on the
ones that you expect to take a non-negligible amount of runtime. You
will be able to tell if you haven't done enough later, and you can go
back and rebuild more. Once this is done, you then build K with
optimizations and debug info enabled, like so:
mvn package -Dproject.build.type="FastBuild"
Next, you build the semantics with optimizations and debug info
enabled (i.e., kompile -ccopt -O2 --iterated -ccopt -fno-omit-frame-pointer
).
Once all this is done, you should be ready to profile your
application. Essentially, you should run whatever test suite you
usually run, but with perf record -g --
prefixed to the front. For
example, for KEVM it's the following command. (For best data, don't
run this step in parallel.)
perf record -g -- make test-conformance
Finally, you want to filter out just the samples that landed within
the llvm backend and view the report. For this, you need to know the
name of the binary that was generated by your build system. Normally
it is interpreter
, but e.g. if you are building the web3 client for
kevm, it would be kevm-client
. You will want to run the following
command.
perf report -g -c $binary_name
If all goes well, you should see a breakdown of where CPU time has
been spent executing the application. You will know that sufficient
time was spent rebuilding dependencies with the correct flags when the
total time reported by the main method is close to 100%. If it's not
close to 100%, this is probably because a decent amount of self time
was reported in stack traces that were not built with frame pointers
enabled, meaning that perf was unable to walk the stack. You will have
to go back, rebuild the appropriate libraries, and then record your
trace again.
Your ultimate goal is to identify the hotspots that take the most
time, and make them execute faster. Entries like step
and
step_1234
like functions refer to the cost of matching. An entry
like side_condition_1234
is a side condition and apply_rule_1234
is constructing the rhs of a rule. You can convert from this rule
ordinal to a location using the llvm-kompile-compute-loc
script in
the bin folder of the llvm backend repo. For example,
llvm-kompile-compute-loc 5868 evm-semantics/.build/defn/llvm/driver-kompiled
spits out the following text.
Line: 18529
/home/dwightguth/evm-semantics/./.build/defn/llvm/driver.k:493:10
This is the line of definition.kore
that the axiom appears on as
well as the original location of the rule in the K semantics. You can
use this information to figure out which rules and functions are
causing the most time and optimize them to be more efficient.
The kserver
is a front-end tool based on Nailgun
which helps to reduce the startup time of the JVM. Calling kserver
in a terminal
window will wait for all kompile/kprove calls and force them to run in the same process
and share the same threads. This also reduces the thread contention significantly. kompile
uses all the threads available to do rule parsing. Another benefit is that it saves caches,
and each time you call kprove/kast, you can access those directly w/o extra disk usage.
Running the regression-new
integration tests on a powerful machine (32 threads) takes 8m,
with the kserver active it takes 2m. You can start the kserver in two ways.
kserver
in the command line. Close it after you are done testing. Useful for quick testing.spawn-kserver <log.flie>
and close it with stop-kserver
- this is used for automation on CIBecause we reuse caches, you should stop and restart the server between runs.
The Nailgun implementation hasn't been updated in the last 3-5 years, and it's not compatible with Java 18 onwards.
The K Builtins (also referred to as the K Prelude or the K Standard Library)
consists of several files which contain definitions that make working with K
simpler. These files can be found under include/kframework/builtin
in your K
installation directory, and can be imported with requires "FILENAME"
(without
the path prefix).
A major piece of the K prelude consists of a series of modules that contain
implementations of basic data types and language features in K. You do not need
to require this file yourself; it is required automatically in every K
definition unless --no-prelude
is passed to kompile. K may not work correctly
if some of these modules do not exist or do not declare certain functions.
Note that some functions in the K prelude functions are not total, that is,
they are not defined on all possible input values. When you invoke such a
function on an undefined input, the behavior is undefined. In particular, when
this happens, interpreters generated by the K LLVM backend may crash.
requires "kast.md"
K declares certain modules that contain most of the builtins you usually want
when defining a language in K. In particular, this includes integers, booleans,
strings, identifiers, I/O, lists, maps, and sets. The DOMAINS-SYNTAX
module
is designed to be imported by the syntax module of the language and contains
only the program-level syntax of identifiers, integers, booleans, and strings.
The DOMAINS
module contains the rest of the syntax, including builtin
functions over those and the remaining types.
Note that not all modules are included in DOMAINS. A few less-common modules
are not, including ARRAY
, COLLECTIONS
, FLOAT
, STRING-BUFFER
, BYTES
,
K-REFLECTION
, MINT
.
module DOMAINS-SYNTAX imports SORT-K imports ID-SYNTAX imports UNSIGNED-INT-SYNTAX imports BOOL-SYNTAX imports STRING-SYNTAX endmodule module DOMAINS imports DOMAINS-SYNTAX imports INT imports BOOL imports STRING imports BASIC-K imports LIST imports K-IO imports MAP imports SET imports ID imports K-EQUAL endmodule
Provided here is an implementation for fixed-sized, contiguous maps from Int
to KItem
. In some previous versions of K, the Array
type was a builtin type
backed by mutable arrays of objects. However, in modern K, the Array
type is
implemented by means of the List
type; users should not access this interface
directly and should instead make only of the functions listed below. Users of
this module should import only the ARRAY
module.
module ARRAY-SYNTAX imports private LIST syntax Array
You can look up an element in an Array
by its index in O(log(N)) time. Note
that the base of the logarithm is a relatively high number and thus the time is
effectively constant.
syntax KItem ::= Array "[" Int "]" [function]
You can create a new Array
with a new value for a key in O(log(N)) time, or
effectively constant.
syntax Array ::= Array "[" key: Int "<-" value: KItem "]" [function, symbol(_[_<-_])]
You can create a new Array
where a particular key is reset to its default
value in O(log(N)) time, or effectively constant.
syntax Array ::= Array "[" Int "<-" "undef" "]" [function]
You can create a new Array
from a List
L
of size N
where the N
elements starting at index
are replaced with the contents of L
, in
O(N*log(K)) time (where K is the size of the array), or effectively linear.
Having index + N > K
yields an exception.
syntax Array ::= updateArray(Array, index: Int, List) [function]
You can create a new Array
where the length
elements starting at index
are replaced with value
, in O(length*log(N)) time, or effectively linear.
syntax Array ::= fillArray(Array, index: Int, length: Int, value: KItem) [function]
You can test whether an integer is within the bounds of an array in O(1) time.
syntax Bool ::= Int "in_keys" "(" Array ")" [function, total]
endmodule module ARRAY-IN-K [private] imports public ARRAY-SYNTAX imports private LIST imports private K-EQUAL imports private INT imports private BOOL
You can create an array with length
elements where each element is
initialized to value
in O(1) time. Note that the array is stored in a manner
where only the highest element that is actually modified is given a value
in its internal representation, which means that subsequent array operations
may incur a one-time O(N) resizing cost, possibly amortized across multiple
operations.
syntax Array ::= makeArray(length: Int, value: KItem) [function, public]
The remainder of this section consists of an implementation in K of the
operations listed above. Users of the ARRAY
module should not make use
of any of the syntax defined in any of these modules.
syntax Array ::= arr(List, Int, KItem) rule makeArray(I::Int, D::KItem) => arr(.List, I, D) rule arr(L::List, _, _ ) [ IDX::Int ] => L[IDX] requires 0 <=Int IDX andBool IDX <Int size(L) rule arr(_ , _, D::KItem) [ _ ] => D [owise] syntax List ::= ensureOffsetList(List, Int, KItem) [function] rule ensureOffsetList(L::List, IDX::Int, D::KItem) => L makeList(IDX +Int 1 -Int size(L), D) requires IDX >=Int size(L) rule ensureOffsetList(L::List, IDX::Int, _::KItem) => L requires notBool IDX >=Int size(L) rule arr(L::List, I::Int, D::KItem) [ IDX::Int <- VAL::KItem ] => arr(ensureOffsetList(L, IDX, D) [ IDX <- VAL ], I, D) rule arr(L::List, I::Int, D::KItem) [ IDX::Int <- undef ] => arr(L, I, D) [ IDX <- D ] rule updateArray(arr(L::List, I::Int, D::KItem), IDX::Int, L2::List) => arr(updateList(ensureOffsetList(L, IDX +Int size(L2) -Int 1, D), IDX, L2), I, D) rule fillArray(arr(L::List, I::Int, D::KItem), IDX::Int, LEN::Int, VAL::KItem) => arr(fillList(ensureOffsetList(L, IDX +Int LEN -Int 1, D), IDX, LEN, VAL), I, D) rule IDX::Int in_keys(arr(_, I::Int, _)) => IDX >=Int 0 andBool IDX <Int I endmodule module ARRAY-SYMBOLIC [symbolic] imports ARRAY-IN-K endmodule module ARRAY-KORE imports ARRAY-IN-K endmodule module ARRAY imports ARRAY-SYMBOLIC imports ARRAY-KORE endmodule
Provided here is the syntax of an implementation of immutable, associative,
commutative maps from KItem
to KItem
. This type is hooked to an
implementation of maps provided by the backend. For more information on
matching on maps and allowable patterns for doing so, refer to K's
user documentation.
module MAP imports private BOOL-SYNTAX imports private INT-SYNTAX imports private LIST imports private SET syntax Map [hook(MAP.Map)]
The Map
sort represents a generalized associative array. Each key can be
paired with an arbitrary value, and can be used to reference its associated
value. Multiple bindings for the same key are not allowed.
You can construct a new Map consisting of key/value pairs of two Maps. The
result is #False
if the maps have keys in common (in particular, this will
yield an exception during concrete execution). This operation is O(Nlog(M))
where N is the size of the smaller map, when it appears on the right hand side.
When it appears on the left hand side and all variables are bound, it is
O(Nlog(M)) where M is the size of the map it is matching and N is the number
of elements being matched. When it appears on the left hand side containing
variables not bound elsewhere in the term, it is O(N^K) where N is the size of
the map it is matching and K is the number of unbound keys being matched. In
other words, one unbound variable is linear, two is quadratic, three is cubic,
etc.
syntax Map ::= Map Map [left, function, hook(MAP.concat), symbol(_Map_), assoc, comm, unit(.Map), element(_|->_), index(0), format(%1%n%2)]
The map with zero elements is represented by .Map
.
syntax Map ::= ".Map" [function, total, hook(MAP.unit), symbol(.Map)]
An element of a Map
is constructed via the |->
operator. The key is on the
left and the value is on the right.
syntax Map ::= KItem "|->" KItem [function, total, hook(MAP.element), symbol(_|->_), injective] syntax priority _|->_ > _Map_ .Map syntax non-assoc _|->_
You can look up the value associated with the key of a map in O(log(N)) time.
Note that the base of the logarithm is a relatively high number and thus the
time is effectively constant. The value is #False
if the key is not in the
map (in particular, this will yield an exception during concrete execution).
syntax KItem ::= Map "[" KItem "]" [function, hook(MAP.lookup), symbol(Map:lookup)]
You can also look up the value associated with the key of a map using a
total function that assigns a specific default value if the key is not present
in the map. This operation is also O(log(N)), or effectively constant.
syntax KItem ::= Map "[" KItem "]" "orDefault" KItem [function, total, hook(MAP.lookupOrDefault), symbol(Map:lookupOrDefault)]
You can insert a key/value pair into a map in O(log(N)) time, or effectively
constant.
syntax Map ::= Map "[" key: KItem "<-" value: KItem "]" [function, total, symbol(Map:update), hook(MAP.update), prefer]
You can remove a key/value pair from a map via its key in O(log(N)) time, or
effectively constant.
syntax Map ::= Map "[" KItem "<-" "undef" "]" [function, total, hook(MAP.remove), symbol(_[_<-undef])]
You can remove the key/value pairs in a map that are present in another map in
O(N*log(M)) time (where M is the size of the first map and N is the size of the
second), or effectively linear. Note that only keys whose value is the same
in both maps are removed. To remove all the keys in one map from another map,
you can say removeAll(M1, keys(M2))
.
syntax Map ::= Map "-Map" Map [function, total, hook(MAP.difference)]
You can update a map by adding all the key/value pairs in the second map in
O(N*log(M)) time (where M is the size of the first map and N is the size of the
second map), or effectively linear. If any keys are present in both maps, the
value from the second map overwrites the value in the first. This function is
total, which is distinct from map concatenation, a partial function only
defined on maps with disjoint keys.
syntax Map ::= updateMap(Map, Map) [function, total, hook(MAP.updateAll)]
You can remove a Set
of keys from a map in O(N*log(M)) time (where M is the
size of the Map
and N
is the size of the Set
), or effectively linear.
syntax Map ::= removeAll(Map, Set) [function, total, hook(MAP.removeAll)]
Set
)You can get a Set
of all the keys in a Map in O(N) time.
syntax Set ::= keys(Map) [function, total, hook(MAP.keys)]
List
)You can get a List
of all the keys in a Map in O(N) time.
syntax List ::= "keys_list" "(" Map ")" [function, hook(MAP.keys_list)]
You can check whether a key is present in a map in O(1) time.
syntax Bool ::= KItem "in_keys" "(" Map ")" [function, total, hook(MAP.in_keys)]
List
)You can get a List
of all the values in a map in O(N) time.
syntax List ::= values(Map) [function, hook(MAP.values)]
You can get the number of key/value pairs in a map in O(1) time.
syntax Int ::= size(Map) [function, total, hook(MAP.size), symbol(sizeMap)]
You can determine whether a Map
is a strict subset of another Map
in O(N)
time (where N is the size of the first map). Only keys that are bound to the
same value are considered equal.
syntax Bool ::= Map "<=Map" Map [function, total, hook(MAP.inclusion)]
You can get an arbitrarily chosen key of a Map
in O(1) time. The same key
will always be returned for the same map, but no guarantee is given that two
different maps will return the same element, even if they are similar.
syntax KItem ::= choice(Map) [function, hook(MAP.choice), symbol(Map:choice)]
The remainder of this section contains lemmas used by the Java and Haskell
backend to simplify expressions of sort Map
. They do not affect the semantics
of maps, merely describing additional rules that the backend can use to
simplify terms.
endmodule module MAP-KORE-SYMBOLIC [symbolic,haskell] imports MAP imports private K-EQUAL imports private BOOL rule #Ceil(@M:Map [@K:KItem]) => {(@K in_keys(@M)) #Equals true} #And #Ceil(@M) #And #Ceil(@K) [simplification] // Symbolic update // Adding the definedness condition `notBool (K in_keys(M))` in the ensures clause of the following rule would be redundant // because K also appears in the rhs, preserving the case when it's #Bottom. rule (K |-> _ M:Map) [ K <- V ] => (K |-> V M) [simplification] rule M:Map [ K <- V ] => (K |-> V M) requires notBool (K in_keys(M)) [simplification] rule M:Map [ K <- _ ] [ K <- V ] => M [ K <- V ] [simplification] // Adding the definedness condition `notBool (K1 in_keys(M))` in the ensures clause of the following rule would be redundant // because K1 also appears in the rhs, preserving the case when it's #Bottom. rule (K1 |-> V1 M:Map) [ K2 <- V2 ] => (K1 |-> V1 (M [ K2 <- V2 ])) requires K1 =/=K K2 [simplification] // Symbolic remove rule (K |-> _ M:Map) [ K <- undef ] => M ensures notBool (K in_keys(M)) [simplification] rule M:Map [ K <- undef ] => M requires notBool (K in_keys(M)) [simplification] // Adding the definedness condition `notBool (K1 in_keys(M))` in the ensures clause of the following rule would be redundant // because K1 also appears in the rhs, preserving the case when it's #Bottom. rule (K1 |-> V1 M:Map) [ K2 <- undef ] => (K1 |-> V1 (M [ K2 <- undef ])) requires K1 =/=K K2 [simplification] // Symbolic lookup rule (K |-> V M:Map) [ K ] => V ensures notBool (K in_keys(M)) [simplification] rule (K1 |-> _V M:Map) [ K2 ] => M [K2] requires K1 =/=K K2 ensures notBool (K1 in_keys(M)) [simplification] rule (_MAP:Map [ K <- V1 ]) [ K ] => V1 [simplification] rule ( MAP:Map [ K1 <- _V1 ]) [ K2 ] => MAP [ K2 ] requires K1 =/=K K2 [simplification] rule (K |-> V M:Map) [ K ] orDefault _ => V ensures notBool (K in_keys(M)) [simplification] rule (K1 |-> _V M:Map) [ K2 ] orDefault D => M [K2] orDefault D requires K1 =/=K K2 ensures notBool (K1 in_keys(M)) [simplification] rule (_MAP:Map [ K <- V1 ]) [ K ] orDefault _ => V1 [simplification] rule ( MAP:Map [ K1 <- _V1 ]) [ K2 ] orDefault D => MAP [ K2 ] orDefault D requires K1 =/=K K2 [simplification] rule .Map [ _ ] orDefault D => D [simplification] // Symbolic in_keys rule K in_keys(_M [ K <- undef ]) => false [simplification] rule K in_keys(_M [ K <- _ ]) => true [simplification] rule K1 in_keys(M [ K2 <- _ ]) => true requires K1 ==K K2 orBool K1 in_keys(M) [simplification] rule K1 in_keys(M [ K2 <- _ ]) => K1 in_keys(M) requires K1 =/=K K2 [simplification] rule {false #Equals @Key in_keys(.Map)} => #Ceil(@Key) [simplification] rule {@Key in_keys(.Map) #Equals false} => #Ceil(@Key) [simplification] rule {false #Equals @Key in_keys(Key' |-> Val @M)} => #Ceil(@Key) #And #Ceil(Key' |-> Val @M) #And #Not({@Key #Equals Key'}) #And {false #Equals @Key in_keys(@M)} [simplification] rule {@Key in_keys(Key' |-> Val @M) #Equals false} => #Ceil(@Key) #And #Ceil(Key' |-> Val @M) #And #Not({@Key #Equals Key'}) #And {@Key in_keys(@M) #Equals false} [simplification] /* // The rule below is automatically generated by the frontend for every sort // hooked to MAP.Map. It is left here to serve as documentation. rule #Ceil(@M:Map (@K:KItem |-> @V:KItem)) => {(@K in_keys(@M)) #Equals false} #And #Ceil(@M) #And #Ceil(@K) #And #Ceil(@V) [simplification] */ endmodule module MAP-SYMBOLIC imports MAP-KORE-SYMBOLIC endmodule
Provided here is the syntax of an implementation of immutable, associative,
commutative range maps from Int
to KItem
. This type is hooked to an
implementation of range maps provided by the LLVM backend.
Currently, this type is not supported by other backends.
Although the underlying range map data structure supports any key sort, the
current implementation by the backend only supports Int
keys due to
limitations of the underlying ordering function.
module RANGEMAP imports private BOOL-SYNTAX imports private INT-SYNTAX imports private LIST imports private SET
syntax Range ::= "[" KItem "," KItem ")" [symbol(RangeMap:Range)] syntax RangeMap [hook(RANGEMAP.RangeMap)]
The RangeMap
sort represents a map whose keys are stored as ranges, bounded
inclusively below and exclusively above. Contiguous or overlapping ranges that
map to the same value are merged into a single range.
You can construct a new RangeMap
consisting of range/value pairs of two
RangeMaps. If the RangeMaps have overlapping ranges an exception will be
thrown during concrete execution. This operation is O(N*log(M)) (where N is
the size of the smaller map and M is the size of the larger map).
syntax RangeMap ::= RangeMap RangeMap [left, function, hook(RANGEMAP.concat), symbol(_RangeMap_), assoc, comm, unit(.RangeMap), element(_r|->_), index(0), format(%1%n%2)]
The RangeMap
with zero elements is represented by .RangeMap
.
syntax RangeMap ::= ".RangeMap" [function, total, hook(RANGEMAP.unit), symbol(.RangeMap)]
An element of a RangeMap
is constructed via the r|->
operator. The range
of keys is on the left, and the value is on the right.
syntax RangeMap ::= Range "r|->" KItem [function, hook(RANGEMAP.elementRng), symbol(_r|->_), injective] syntax priority _r|->_ > _RangeMap_ .RangeMap syntax non-assoc _r|->_
You can look up the value associated with a key of a RangeMap
in O(log(N))
time (where N is the size of the RangeMap
). This will yield an exception
during concrete execution if the key is not in the range map.
syntax KItem ::= RangeMap "[" KItem "]" [function, hook(RANGEMAP.lookup), symbol(RangeMap:lookup)]
You can also look up the value associated with a key of a RangeMap
using a
total function that assigns a specific default value if the key is not present
in the RangeMap
. This operation is also O(log(N)) (where N is the size of
the range map).
syntax KItem ::= RangeMap "[" KItem "]" "orDefault" KItem [function, total, hook(RANGEMAP.lookupOrDefault), symbol(RangeMap:lookupOrDefault)]
You can look up for the range that a key of a RangeMap
is stored in in
O(log(N)) time (where N is the size of the RangeMap
). This will yield an
exception during concrete execution if the key is not in the range map.
syntax Range ::= "find_range" "(" RangeMap "," KItem ")" [function, hook(RANGEMAP.find_range), symbol(RangeMap:find_range)]
You can insert a range/value pair into a RangeMap
in O(log(N)) time (where N
is the size of the RangeMap
). Any ranges adjacent to or overlapping with the
range to be inserted will be updated accordingly.
syntax RangeMap ::= RangeMap "[" keyRange: Range "<-" value: KItem "]" [function, symbol(RangeMap:update), hook(RANGEMAP.updateRng), prefer]
You can remove a range/value pair from a RangeMap
in O(log(N)) time (where N
is the size of the RangeMap
). If all or any part of the range is present in
the range map, it will be removed.
syntax RangeMap ::= RangeMap "[" Range "<-" "undef" "]" [function, hook(RANGEMAP.removeRng), symbol(_r[_<-undef])]
You can remove the range/value pairs in a RangeMap
that are also present in
another RangeMap
in O(max{M,N}*log(M)) time (where M is the size of the
first RangeMap
and N is the size of the second RangeMap
). Note that only
the parts of overlapping ranges whose value is the same in both range maps
will be removed.
syntax RangeMap ::= RangeMap "-RangeMap" RangeMap [function, total, hook(RANGEMAP.difference)]
You can update a RangeMap
by adding all the range/value pairs in the second
RangeMap
in O(N*log(M+N)) time (where M is the size of the first RangeMap
and N is the size of the second RangeMap
). If any ranges are overlapping,
the value from the second range map overwrites the value in the first for the
parts where ranges are overlapping. This function is total, which is distinct
from range map concatenation, a partial function only defined on range maps
with non overlapping ranges.
syntax RangeMap ::= updateRangeMap(RangeMap, RangeMap) [function, total, hook(RANGEMAP.updateAll)]
You can remove a Set
of ranges from a RangeMap
in O(N*log(M)) time (where
M is the size of the RangeMap
and N is the size of the Set
). For every
range in the set, all or any part of it that is present in the range map will
be removed.
syntax RangeMap ::= removeAll(RangeMap, Set) [function, hook(RANGEMAP.removeAll)]
Set
)You can get a Set
of all the ranges in a RangeMap
in O(N) time (where N
is the size of the RangeMap
).
syntax Set ::= keys(RangeMap) [function, total, hook(RANGEMAP.keys)]
List
)You can get a List
of all the ranges in a RangeMap
in O(N) time (where N
is the size of the RangeMap
).
syntax List ::= "keys_list" "(" RangeMap ")" [function, hook(RANGEMAP.keys_list)]
You can check whether a key is present in a RangeMap
in O(log(N)) time (where
N is the size of the RangeMap
).
syntax Bool ::= KItem "in_keys" "(" RangeMap ")" [function, total, hook(RANGEMAP.in_keys)]
List
)You can get a List
of all values in a RangeMap
in O(N) time (where N is the
size of the RangeMap
).
syntax List ::= values(RangeMap) [function, hook(RANGEMAP.values)]
You can get the number of range/value pairs in a RangeMap
in O(1) time.
syntax Int ::= size(RangeMap) [function, total, hook(RANGEMAP.size), symbol(sizeRangeMap)]
You can determine whether a RangeMap
is a strict subset of another RangeMap
in O(M+N) time (where M is the size of the first RangeMap
and N is the size
of the second RangeMap
). Only keys within equal or overlapping ranges that
are bound to the same value are considered equal.
syntax Bool ::= RangeMap "<=RangeMap" RangeMap [function, total, hook(RANGEMAP.inclusion)]
You can get an arbitrarily chosen key of a RangeMap
in O(1) time. The same
key will always be returned for the same range map, but no guarantee is given
that two different range maps will return the same element, even if they are
similar.
syntax KItem ::= choice(RangeMap) [function, hook(RANGEMAP.choice), symbol(RangeMap:choice)] endmodule
Provided here is the syntax of an implementation of immutable, associative,
commutative sets of KItem
. This type is hooked to an implementation of sets
provided by the backend. For more information on matching on sets and allowable
patterns for doing so, refer to K's
user documentation.
module SET imports private INT-SYNTAX imports private BASIC-K syntax Set [hook(SET.Set)]
The Set
sort represents a mathematical set (A collection of unique items).
The sets are nilpotent, i.e., the concatenation of two sets containing elements
in common is #False
(note however, this may be silently allowed during
concrete execution). If you intend to add an element to a set that might
already be present in the set, use the |Set
operator instead.
The concatenation operator is O(Nlog(M)) where N is the size of the smaller
set, when it appears on the right hand side. When it appears on the left hand
side and all variables are bound, it is O(Nlog(M)) where M is the size of the
set it is matching and N is the number of elements being matched. When it
appears on the left hand side containing variables not bound elsewhere in the
term, it is O(N^K) where N is the size of the set it is matching and K is the
number of unbound keys being mached. In other words, one unbound variable is
linear, two is quadratic, three is cubic, etc.
syntax Set ::= Set Set [left, function, hook(SET.concat), symbol(_Set_), assoc, comm, unit(.Set), idem, element(SetItem), format(%1%n%2)]
The set with zero elements is represented by .Set
.
syntax Set ::= ".Set" [function, total, hook(SET.unit), symbol(.Set)]
An element of a Set
is constructed via the SetItem
operator.
syntax Set ::= SetItem(KItem) [function, total, hook(SET.element), symbol(SetItem), injective]
You can compute the union of two sets in O(N*log(M)) time (Where N is the size
of the smaller set). Note that the base of the logarithm is a relatively high
number and thus the time is effectively linear. The union consists of all the
elements present in either set.
syntax Set ::= Set "|Set" Set [left, function, total, hook(SET.union), comm] rule S1:Set |Set S2:Set => S1 (S2 -Set S1) [concrete]
You can compute the intersection of two sets in O(N*log(M)) time (where N
is the size of the smaller set), or effectively linear. The intersection
consists of all the elements present in both sets.
syntax Set ::= intersectSet(Set, Set) [function, total, hook(SET.intersection), comm]
You can compute the relative complement of two sets in O(N*log(M)) time (where
N is the size of the second set), or effectively linear. This is the set of
elements in the first set that are not present in the second set.
syntax Set ::= Set "-Set" Set [function, total, hook(SET.difference), symbol(Set:difference)]
You can compute whether an element is a member of a set in O(1) time.
syntax Bool ::= KItem "in" Set [function, total, hook(SET.in), symbol(Set:in)]
You can determine whether a Set
is a strict subset of another Set
in O(N)
time (where N is the size of the first set).
syntax Bool ::= Set "<=Set" Set [function, total, hook(SET.inclusion)]
You can get the number of elements (the cardinality) of a set in O(1) time.
syntax Int ::= size(Set) [function, total, hook(SET.size)]
You can get an arbitrarily chosen element of a Set
in O(1) time. The same
element will always be returned for the same set, but no guarantee is given
that two different sets will return the same element, even if they are similar.
syntax KItem ::= choice(Set) [function, hook(SET.choice), symbol(Set:choice)]
endmodule
The following lemmas are simplifications that the Haskell backend can
apply to simplify expressions of sort Set
.
module SET-KORE-SYMBOLIC [symbolic,haskell] imports SET imports private K-EQUAL imports private BOOL //Temporarly rule for #Ceil simplification, should be generated in front-end // Matching for this version not implemented. // rule #Ceil(@S1:Set @S2:Set) => // {intersectSet(@S1, @S2) #Equals .Set} #And #Ceil(@S1) #And #Ceil(@S2) // [simplification] //simpler version rule #Ceil(@S:Set SetItem(@E:KItem)) => {(@E in @S) #Equals false} #And #Ceil(@S) #And #Ceil(@E) [simplification] // -Set simplifications rule S -Set .Set => S [simplification] rule .Set -Set _ => .Set [simplification] rule SetItem(X) -Set (S SetItem(X)) => .Set ensures notBool (X in S) [simplification] rule S -Set (S SetItem(X)) => .Set ensures notBool (X in S) [simplification] rule (S SetItem(X)) -Set S => SetItem(X) ensures notBool (X in S) [simplification] rule (S SetItem(X)) -Set SetItem(X) => S ensures notBool (X in S) [simplification] // rule SetItem(X) -Set S => SetItem(X) // requires notBool (X in S) [simplification] // rule (S1 SetItem(X)) -Set (S2 SetItem(X)) => S1 -Set S2 // ensures notBool (X in S1) // andBool notBool (X in S2) [simplification] // |Set simplifications rule S |Set .Set => S [simplification, comm] rule S |Set S => S [simplification] rule (S SetItem(X)) |Set SetItem(X) => S SetItem(X) ensures notBool (X in S) [simplification, comm] // Currently disabled, see runtimeverification/haskell-backend#3301 // rule (S SetItem(X)) |Set S => S SetItem(X) // ensures notBool (X in S) [simplification, comm] // intersectSet simplifications rule intersectSet(.Set, _ ) => .Set [simplification, comm] rule intersectSet( S , S ) => S [simplification] rule intersectSet( S SetItem(X), SetItem(X)) => SetItem(X) ensures notBool (X in S) [simplification, comm] // Currently disabled, see runtimeverification/haskell-backend#3294 // rule intersectSet( S SetItem(X) , S) => S ensures notBool (X in S) [simplification, comm] rule intersectSet( S1 SetItem(X), S2 SetItem(X)) => intersectSet(S1, S2) SetItem(X) ensures notBool (X in S1) andBool notBool (X in S2) [simplification] // membership simplifications rule _E in .Set => false [simplification] rule E in (S SetItem(E)) => true ensures notBool (E in S) [simplification] // These two rules would be sound but impose a giant overhead on `in` evaluation: // rule E1 in (S SetItem(E2)) => true requires E1 in S // ensures notBool (E2 in S) [simplification] // rule E1 in (S SetItem(E2)) => E1 in S requires E1 =/=K E2 // ensures notBool (E2 in S) [simplification] rule X in ((SetItem(X) S) |Set _ ) => true ensures notBool (X in S) [simplification] rule X in ( _ |Set (SetItem(X) S)) => true ensures notBool (X in S) [simplification] endmodule module SET-SYMBOLIC imports SET-KORE-SYMBOLIC endmodule
Provided here is the syntax of an implementation of immutable, associative
lists of KItem
. This type is hooked to an implementation of lists provided
by the backend. For more information on matching on lists and allowable
patterns for doing so, refer to K's
user documentation.
module LIST imports private INT-SYNTAX imports private BASIC-K syntax List [hook(LIST.List)]
The List
sort is an ordered collection that may contain duplicate elements.
They are backed by relaxed radix balanced trees, which means that they support
efficiently adding elements to both sides of the list, concatenating two lists,
indexing, and updating elements.
The concatenation operator is O(log(N)) (where N is the size of the longer
list) when it appears on the right hand side. When it appears on the left hand
side, it is O(N), where N is the number of elements matched on the front and
back of the list.
syntax List ::= List List [left, function, total, hook(LIST.concat), symbol(_List_), smtlib(smt_seq_concat), assoc, unit(.List), element(ListItem), update(List:set), format(%1%n%2)]
The list with zero elements is represented by .List
.
syntax List ::= ".List" [function, total, hook(LIST.unit), symbol(.List), smtlib(smt_seq_nil)]
An element of a List
is constucted via the ListItem
operator.
syntax List ::= ListItem(KItem) [function, total, hook(LIST.element), symbol(ListItem), smtlib(smt_seq_elem)]
An element can be added to the front of a List
using the pushList
operator.
syntax List ::= pushList(KItem, List) [function, total, hook(LIST.push), symbol(pushList)] rule pushList(K::KItem, L1::List) => ListItem(K) L1
You can get an element of a list by its integer offset in O(log(N)) time, or
effectively constant. Positive indices are 0-indexed from the beginning of the
list, and negative indices are -1-indexed from the end of the list. In other
words, 0 is the first element and -1 is the last element.
syntax KItem ::= List "[" Int "]" [function, hook(LIST.get), symbol(List:get)]
You can create a new List
with a new value at a particular index in
O(log(N)) time, or effectively constant.
syntax List ::= List "[" index: Int "<-" value: KItem "]" [function, hook(LIST.update), symbol(List:set)]
You can create a list with length
elements, each containing value
, in O(N)
time.
syntax List ::= makeList(length: Int, value: KItem) [function, hook(LIST.make)]
You can create a new List
which is equal to dest
except the N
elements
starting at index
are replaced with the contents of src
in O(N*log(K)) time
(where K
is the size of dest
and N
is the size of src
), or effectively linear. Having index + N > K
yields an exception.
syntax List ::= updateList(dest: List, index: Int, src: List) [function, hook(LIST.updateAll)]
You can create a new List
where the length
elements starting at index
are replaced with value
, in O(length*log(N)) time, or effectively linear.
syntax List ::= fillList(List, index: Int, length: Int, value: KItem) [function, hook(LIST.fill)]
You can compute a new List
by removing fromFront
elements from the front
of the list and fromBack
elements from the back of the list in
O((fromFront+fromBack)*log(N)) time, or effectively linear.
syntax List ::= range(List, fromFront: Int, fromBack: Int) [function, hook(LIST.range), symbol(List:range)]
You can compute whether an element is in a list in O(N) time. For repeated
comparisons, it is much better to first convert to a set using List2Set
.
syntax Bool ::= KItem "in" List [function, total, hook(LIST.in), symbol(_inList_)]
You can get the number of elements of a list in O(1) time.
syntax Int ::= size(List) [function, total, hook(LIST.size), symbol(sizeList), smtlib(smt_seq_len)]
endmodule
It is possible to convert from a List
to a Set
or from a Set
to a list.
Converting from a List
to a Set
and back will not provide the same list;
duplicates will have been removed and the list may be reordered. Converting
from a Set
to a List
and back will generate the same set.
Note that because sets are unordered and lists are ordered, converting from a
Set to a List will generate some arbitrary ordering of elements, which may
be different from the natural ordering you might assume, or may not. Two
equal sets are guaranteed to generate the same ordering, but no guarantee is
otherwise provided about what the ordering will be. In particular, adding an
element to a set may completely reorder the elements already in the set, when
it is converted to a list.
module COLLECTIONS imports LIST imports SET imports MAP syntax List ::= Set2List(Set) [function, total, hook(SET.set2list)] syntax Set ::= List2Set(List) [function, total, hook(SET.list2set)] endmodule
Provided here is the syntax of an implementation of boolean algebra in K.
This type is hooked to an implementation of booleans provided by the backend.
Note that this algebra is different from the builtin truth in matching logic.
You can, however, convert from the truth of the Bool
sort to the truth in
matching logic via the expression {B #Equals true}
.
The boolean values are true
and false
.
module SORT-BOOL syntax Bool [hook(BOOL.Bool)] endmodule module BOOL-SYNTAX imports SORT-BOOL syntax Bool ::= "true" [token] syntax Bool ::= "false" [token] endmodule module BOOL-COMMON imports private BASIC-K imports BOOL-SYNTAX
You can:
P impliesBool Q
is the same asnotBool P orBool Q
)Note that only andThenBool
and orElseBool
are short-circuiting. andBool
and orBool
may be short-circuited in concrete backends, but in symbolic
backends, both arguments will be evaluated.
syntax Bool ::= "notBool" Bool [function, total, symbol(notBool_), smt-hook(not), group(boolOperation), hook(BOOL.not)] > Bool "andBool" Bool [function, total, symbol(_andBool_), left, smt-hook(and), group(boolOperation), hook(BOOL.and)] | Bool "andThenBool" Bool [function, total, symbol(_andThenBool_), left, smt-hook(and), group(boolOperation), hook(BOOL.andThen)] | Bool "xorBool" Bool [function, total, symbol(_xorBool_), left, smt-hook(xor), group(boolOperation), hook(BOOL.xor)] | Bool "orBool" Bool [function, total, symbol(_orBool_), left, smt-hook(or), group(boolOperation), hook(BOOL.or)] | Bool "orElseBool" Bool [function, total, symbol(_orElseBool_), left, smt-hook(or), group(boolOperation), hook(BOOL.orElse)] | Bool "impliesBool" Bool [function, total, symbol(_impliesBool_), left, smt-hook(=>), group(boolOperation), hook(BOOL.implies)] > left: Bool "==Bool" Bool [function, total, symbol(_==Bool_), left, comm, smt-hook(=), hook(BOOL.eq)] | Bool "=/=Bool" Bool [function, total, symbol(_=/=Bool_), left, comm, smt-hook(distinct), hook(BOOL.ne)]
The remainder of this section consists of an implementation in K of the
operations listed above.
rule notBool true => false rule notBool false => true rule true andBool B:Bool => B:Bool rule B:Bool andBool true => B:Bool [simplification] rule false andBool _:Bool => false rule _:Bool andBool false => false [simplification] rule true andThenBool K::Bool => K rule K::Bool andThenBool true => K [simplification] rule false andThenBool _ => false rule _ andThenBool false => false [simplification] rule false xorBool B:Bool => B:Bool rule B:Bool xorBool false => B:Bool [simplification] rule B:Bool xorBool B:Bool => false rule true orBool _:Bool => true rule _:Bool orBool true => true [simplification] rule false orBool B:Bool => B rule B:Bool orBool false => B [simplification] rule true orElseBool _ => true rule _ orElseBool true => true [simplification] rule false orElseBool K::Bool => K rule K::Bool orElseBool false => K [simplification] rule true impliesBool B:Bool => B rule false impliesBool _:Bool => true rule _:Bool impliesBool true => true [simplification] rule B:Bool impliesBool false => notBool B [simplification] rule B1:Bool =/=Bool B2:Bool => notBool (B1 ==Bool B2) endmodule module BOOL-KORE [symbolic] imports BOOL-COMMON rule {true #Equals notBool @B} => {false #Equals @B} [simplification] rule {notBool @B #Equals true} => {@B #Equals false} [simplification] rule {false #Equals notBool @B} => {true #Equals @B} [simplification] rule {notBool @B #Equals false} => {@B #Equals true} [simplification] rule {true #Equals @B1 andBool @B2} => {true #Equals @B1} #And {true #Equals @B2} [simplification] rule {@B1 andBool @B2 #Equals true} => {@B1 #Equals true} #And {@B2 #Equals true} [simplification] rule {false #Equals @B1 orBool @B2} => {false #Equals @B1} #And {false #Equals @B2} [simplification] rule {@B1 orBool @B2 #Equals false} => {@B1 #Equals false} #And {@B2 #Equals false} [simplification] endmodule module BOOL imports BOOL-COMMON imports BOOL-KORE endmodule
Provided here is the syntax of an implementation of arbitrary-precision
integer arithmetic in K. This type is hooked to an implementation of integers
provided by the backend. For a fixed-width integer type, see the MINT
module
below.
The UNSIGNED-INT-SYNTAX
module provides a syntax of whole numbers in K.
This is useful because often programming languages implement the sign of an
integer as a unary operator rather than part of the lexical syntax of integers.
However, you can also directly reference integers with a sign using the
INT-SYNTAX
module.
module UNSIGNED-INT-SYNTAX syntax Int [hook(INT.Int)] syntax Int ::= r"[0-9]+" [prefer, token, prec(2)] endmodule module INT-SYNTAX imports UNSIGNED-INT-SYNTAX syntax Int ::= r"[\\+\\-]?[0-9]+" [prefer, token, prec(2)] endmodule module INT-COMMON imports INT-SYNTAX imports private BOOL
You can:
~Int
of an integer value in twos-complement.^Int
of two integers.^%Int
).A ^%Int B C
is equal in value to (A ^Int B) %Int C
, but has a better*Int
of two integers./Int
or modulus %Int
of two integers using#False
.divInt
or modulus modInt
of two integers using#False
.+Int
or difference -Int
of two integers.>>Int
of two integers. Shifting by a#False
.#False
.syntax Int ::= "~Int" Int [function, symbol(~Int_), total, hook(INT.not), smtlib(notInt)] > left: Int "^Int" Int [function, symbol(_^Int_), left, smt-hook(^), hook(INT.pow)] | Int "^%Int" Int Int [function, symbol(_^%Int__), left, smt-hook((mod (^ #1 #2) #3)), hook(INT.powmod)] > left: Int "*Int" Int [function, total, symbol(_*Int_), left, comm, smt-hook(*), hook(INT.mul)] /* FIXME: translate /Int and %Int into smtlib */ /* /Int and %Int implement t-division, which rounds towards 0. SMT hooks need to convert from Euclidian division operations */ | Int "/Int" Int [function, symbol(_/Int_), left, smt-hook((ite (or (= 0 (mod #1 #2)) (>= #1 0)) (div #1 #2) (ite (> #2 0) (+ (div #1 #2) 1) (- (div #1 #2) 1)))), hook(INT.tdiv)] | Int "%Int" Int [function, symbol(_%Int_), left, smt-hook((ite (or (= 0 (mod #1 #2)) (>= #1 0)) (mod #1 #2) (ite (> #2 0) (- (mod #1 #2) #2) (+ (mod #1 #2) #2)))), hook(INT.tmod)] /* divInt and modInt implement e-division according to the Euclidean division theorem, therefore the remainder is always positive */ | Int "divInt" Int [function, symbol(_divInt_), left, smt-hook(div), hook(INT.ediv)] | Int "modInt" Int [function, symbol(_modInt_), left, smt-hook(mod), hook(INT.emod)] > left: Int "+Int" Int [function, total, symbol(_+Int_), left, comm, smt-hook(+), hook(INT.add)] | Int "-Int" Int [function, total, symbol(_-Int_), left, smt-hook(-), hook(INT.sub)] > left: Int ">>Int" Int [function, symbol(_>>Int_), left, hook(INT.shr), smtlib(shrInt)] | Int "<<Int" Int [function, symbol(_<<Int_), left, hook(INT.shl), smtlib(shlInt)] > left: Int "&Int" Int [function, total, symbol(_&Int_), left, comm, hook(INT.and), smtlib(andInt)] > left: Int "xorInt" Int [function, total, symbol(_xorInt_), left, comm, hook(INT.xor), smtlib(xorInt)] > left: Int "|Int" Int [function, total, symbol(_|Int_), left, comm, hook(INT.or), smtlib(orInt)]
You can compute the minimum and maximum minInt
and maxInt
of two integers.
syntax Int ::= "minInt" "(" Int "," Int ")" [function, total, smt-hook((ite (< #1 #2) #1 #2)), hook(INT.min)] | "maxInt" "(" Int "," Int ")" [function, total, smt-hook((ite (< #1 #2) #2 #1)), hook(INT.max)]
You can compute the absolute value absInt
of an integer.
syntax Int ::= absInt ( Int ) [function, total, smt-hook((ite (< #1 0) (- 0 #1) #1)), hook(INT.abs)]
You can compute the log base 2, rounded towards zero, of an integer. The log
base 2 of an integer is equal to the index of the highest bit set in the
representation of a positive integer. Log base 2 of zero or a negative number
is #False
.
syntax Int ::= log2Int ( Int ) [function, hook(INT.log2)]
You can compute the value of a range of bits in the twos-complement
representation of an integer, as interpeted either unsigned or signed, of an
integer. index
is offset from 0 and length
is the number of bits, starting
with index
, that should be read. The number is assumed to be represented
in little endian notation with each byte going from least significant to
most significant. In other words, 0 is the least-significant bit, and each
successive bit is more significant than the last.
syntax Int ::= bitRangeInt ( Int, index: Int, length: Int ) [function, hook(INT.bitRange)] | signExtendBitRangeInt ( Int, index: Int, length: Int ) [function, hook(INT.signExtendBitRange)]
You can compute whether two integers are less than or equal to, less than,
greater than or equal to, greater than, equal, or unequal to another integer.
syntax Bool ::= Int "<=Int" Int [function, total, symbol(_<=Int_), smt-hook(<=), hook(INT.le)] | Int "<Int" Int [function, total, symbol(_<Int_), smt-hook(<), hook(INT.lt)] | Int ">=Int" Int [function, total, symbol(_>=Int_), smt-hook(>=), hook(INT.ge)] | Int ">Int" Int [function, total, symbol(_>Int_), smt-hook(>), hook(INT.gt)] | Int "==Int" Int [function, total, symbol(_==Int_), comm, smt-hook(=), hook(INT.eq)] | Int "=/=Int" Int [function, total, symbol(_=/=Int_), comm, smt-hook(distinct), hook(INT.ne)]
You can compute whether one integer evenly divides another. This is the
case when the second integer modulo the first integer is equal to zero.
syntax Bool ::= Int "dividesInt" Int [function]
You can, on concrete backends, compute a pseudorandom integer, or seed the
pseudorandom number generator. These operations are represented as
uninterpreted functions on symbolic backends.
syntax Int ::= randInt(Int) [function, hook(INT.rand), impure] syntax K ::= srandInt(Int) [function, hook(INT.srand), impure]
The remainder of this section consists of an implementation in K of some
of the operators above, as well as lemmas used by the Java and Haskell backend
to simplify expressions of sort Int
. They do not affect the semantics of
integers, merely describing additional rules that the backend can use to
simplify terms.
endmodule module INT-SYMBOLIC [symbolic] imports INT-COMMON imports INT-SYMBOLIC-KORE imports private BOOL // Arithmetic Normalization rule I +Int 0 => I [simplification] rule I -Int 0 => I [simplification] rule X modInt N => X requires 0 <=Int X andBool X <Int N [simplification] rule X %Int N => X requires 0 <=Int X andBool X <Int N [simplification] // Bit-shifts rule X <<Int 0 => X [simplification, preserves-definedness] rule 0 <<Int Y => 0 requires 0 <=Int Y [simplification, preserves-definedness] rule X >>Int 0 => X [simplification, preserves-definedness] rule 0 >>Int Y => 0 requires 0 <=Int Y [simplification, preserves-definedness] endmodule module INT-SYMBOLIC-KORE [symbolic, haskell] imports INT-COMMON imports ML-SYNTAX imports private BOOL // Definability Conditions rule #Ceil(@I1:Int /Int @I2:Int) => {(@I2 =/=Int 0) #Equals true} #And #Ceil(@I1) #And #Ceil(@I2) [simplification] rule #Ceil(@I1:Int %Int @I2:Int) => {(@I2 =/=Int 0) #Equals true} #And #Ceil(@I1) #And #Ceil(@I2) [simplification] rule #Ceil(@I1:Int modInt @I2:Int) => {(@I2 =/=Int 0) #Equals true} #And #Ceil(@I1) #And #Ceil(@I2) [simplification] rule #Ceil(@I1:Int >>Int @I2:Int) => {(@I2 >=Int 0) #Equals true} #And #Ceil(@I1) #And #Ceil(@I2) [simplification] rule #Ceil(@I1:Int <<Int @I2:Int) => {(@I2 >=Int 0) #Equals true} #And #Ceil(@I1) #And #Ceil(@I2) [simplification] endmodule module INT-KORE [symbolic] imports private K-EQUAL imports private BOOL imports INT-COMMON rule [eq-k-to-eq-int] : I1:Int ==K I2:Int => I1 ==Int I2 [simplification] rule [eq-int-true-left] : {K1 ==Int K2 #Equals true} => {K1 #Equals K2} [simplification] rule [eq-int-true-rigth] : {true #Equals K1 ==Int K2} => {K1 #Equals K2} [simplification] rule [eq-int-false-left] : {K1 ==Int K2 #Equals false} => #Not({K1 #Equals K2}) [simplification] rule [eq-int-false-rigth] : {false #Equals K1 ==Int K2} => #Not({K1 #Equals K2}) [simplification] rule [neq-int-true-left] : {K1 =/=Int K2 #Equals true} => #Not({K1 #Equals K2}) [simplification] rule [neq-int-true-right] : {true #Equals K1 =/=Int K2} => #Not({K1 #Equals K2}) [simplification] rule [neq-int-false-left] : {K1 =/=Int K2 #Equals false} => {K1 #Equals K2} [simplification] rule [neq-int-false-right]: {false #Equals K1 =/=Int K2} => {K1 #Equals K2} [simplification] // Arithmetic Normalization rule I +Int B => B +Int I [concrete(I), symbolic(B), simplification(51)] rule A -Int I => A +Int (0 -Int I) [concrete(I), symbolic(A), simplification(51)] rule (A +Int I2) +Int I3 => A +Int (I2 +Int I3) [concrete(I2, I3), symbolic(A), simplification] rule I1 +Int (B +Int I3) => B +Int (I1 +Int I3) [concrete(I1, I3), symbolic(B), simplification] rule I1 -Int (B +Int I3) => (I1 -Int I3) -Int B [concrete(I1, I3), symbolic(B), simplification] rule I1 +Int (I2 +Int C) => (I1 +Int I2) +Int C [concrete(I1, I2), symbolic(C), simplification] rule I1 +Int (I2 -Int C) => (I1 +Int I2) -Int C [concrete(I1, I2), symbolic(C), simplification] rule (I1 -Int B) +Int I3 => (I1 +Int I3) -Int B [concrete(I1, I3), symbolic(B), simplification] rule I1 -Int (I2 +Int C) => (I1 -Int I2) -Int C [concrete(I1, I2), symbolic(C), simplification] rule I1 -Int (I2 -Int C) => (I1 -Int I2) +Int C [concrete(I1, I2), symbolic(C), simplification] rule (C -Int I2) -Int I3 => C -Int (I2 +Int I3) [concrete(I2, I3), symbolic(C), simplification] rule I1 &Int (I2 &Int C) => (I1 &Int I2) &Int C [concrete(I1, I2), symbolic(C), simplification] endmodule module INT imports INT-COMMON imports INT-SYMBOLIC imports INT-KORE imports private K-EQUAL imports private BOOL rule bitRangeInt(I::Int, IDX::Int, LEN::Int) => (I >>Int IDX) modInt (1 <<Int LEN) rule signExtendBitRangeInt(I::Int, IDX::Int, LEN::Int) => (bitRangeInt(I, IDX, LEN) +Int (1 <<Int (LEN -Int 1))) modInt (1 <<Int LEN) -Int (1 <<Int (LEN -Int 1)) rule I1:Int divInt I2:Int => (I1 -Int (I1 modInt I2)) /Int I2 requires I2 =/=Int 0 rule I1:Int modInt I2:Int => ((I1 %Int absInt(I2)) +Int absInt(I2)) %Int absInt(I2) requires I2 =/=Int 0 [concrete, simplification] rule minInt(I1:Int, I2:Int) => I1 requires I1 <Int I2 rule minInt(I1:Int, I2:Int) => I2 requires I1 >=Int I2 rule I1:Int =/=Int I2:Int => notBool (I1 ==Int I2) rule (I1:Int dividesInt I2:Int) => (I2 %Int I1) ==Int 0 syntax Int ::= freshInt(Int) [freshGenerator, function, total, private] rule freshInt(I:Int) => I endmodule
Provided here is the syntax of an implementation of arbitrary-precision
floating-point arithmetic in K based on a generalization of the IEEE 754
standard. This type is hooked to an implementation of floats provided by the
backend.
The syntax of ordinary floating-point values in K consists of an optional sign
(+ or -) followed by an optional integer part, followed by a decimal point,
followed by an optional fractional part. Either the integer part or the
fractional part must be specified. The mantissa is followed by an optional
exponent part, which consists of an e
or E
, an optional sign (+ or -),
and an integer. The expoennt is followed by an optional suffix, which can be
either f
, F
, d
, D
, or pNxM
where N
and M
are positive integers.
p
and x
can be either upper or lowercase.
The value of a floating-point literal is computed as follows: First the
mantissa is read as a rational number. Then it is multiplied by 10 to the
power of the exponent, which is interpreted as an integer, and defaults to
zero if it is not present. Finally, it is rounded to the nearest possible
value in a floating-point type represented like an IEEE754 floating-point type,
with the number of bits of precision and exponent specified by the suffix.
A suffix of f
or f
represents the IEEE binary32
format. A suffix of d
or D
, or no suffix, represents the IEEE binary64
format. A suffix of
pNxM
(either upper or lowercase) specifies exactly N
bits of precision and
M
bits of exponent. The number of bits of precision is assumed to include
any optional 1
that precedes the IEEE 754 mantissa. In other words, p24x8
is equal to the IEEE binary32
format, and p53x11
is equal to the IEEE
binary64
format.
module FLOAT-SYNTAX syntax Float [hook(FLOAT.Float)] syntax Float ::= r"([\\+\\-]?[0-9]+(\\.[0-9]*)?|\\.[0-9]+)([eE][\\+\\-]?[0-9]+)?([fFdD]|([pP][0-9]+[xX][0-9]+))?" [token, prec(1)] syntax Float ::= r"[\\+\\-]?Infinity([fFdD]|([pP][0-9]+[xX][0-9]+))?" [token, prec(3)] syntax Float ::= r"NaN([fFdD]|([pP][0-9]+[xX][0-9]+))?" [token, prec(3)] endmodule module FLOAT imports FLOAT-SYNTAX imports private BOOL imports private INT-SYNTAX
You can retrieve the number of bits of precision in a Float
.
syntax Int ::= precisionFloat(Float) [function, total, hook(FLOAT.precision)]
You can retrieve the number of bits of exponent range in a Float
.
syntax Int ::= exponentBitsFloat(Float) [function, total, hook(FLOAT.exponentBits)]
You can retrieve the value of the exponent bits of a Float
as an integer.
syntax Int ::= exponentFloat(Float) [function, total, hook(FLOAT.exponent)]
You can retrieve the value of the sign bit of a Float
as a boolean. True
means the sign bit is set.
syntax Bool ::= signFloat(Float) [function, total, hook(FLOAT.sign)]
You can check whether a Float
value is infinite or Not-a-Number.
syntax Bool ::= isNaN(Float) [function, total, smt-hook(fp.isNaN), hook(FLOAT.isNaN)] | isInfinite(Float) [function, total]
You can:
--Float
of a float. --Float X
is distinct0.0 -Float X
. For example, 0.0 -Float 0.0
is positive zero.--Float 0.0
is negative zero.^Float
of two floats.*Float
, quotient /Float
, or remainder %Float
of two+Float
or difference -Float
of two floats.syntax Float ::= "--Float" Float [function, total, smt-hook(fp.neg), hook(FLOAT.neg)] > Float "^Float" Float [function, left, hook(FLOAT.pow)] > left: Float "*Float" Float [function, left, smt-hook((fp.mul roundNearestTiesToEven #1 #2)), hook(FLOAT.mul)] | Float "/Float" Float [function, left, smt-hook((fp.div roundNearestTiesToEven #1 #2)), hook(FLOAT.div)] | Float "%Float" Float [function, left, smt-hook((fp.rem roundNearestTiesToEven #1 #2)), hook(FLOAT.rem)] > left: Float "+Float" Float [function, left, smt-hook((fp.add roundNearestTiesToEven #1 #2)), hook(FLOAT.add)] | Float "-Float" Float [function, left, smt-hook((fp.sub roundNearestTiesToEven #1 #2)), hook(FLOAT.sub)]
You can:
rootFloat
of a float.absFloat
of a float.roundFloat
). The resulting Float
will yield the specified valuesprecisionFloat
and exponentBitsFloat
and when performingfloorFloat
).ceilFloat
).truncFloat
).expFloat
of a float (i.e. e^x).logFloat
of a float.sinFloat
of a float.cosFloat
of a float.tanFlooat
of a float.asinFloat
of a float.acosFloat
of a float.atanFloat
of a float.atan2Float
of two floats.maxFloat
of two floats.minFloat
of two floats.sqrtFloat
of a float.maxValueFloat
).minValueFloat
).syntax Float ::= rootFloat(Float, Int) [function, hook(FLOAT.root)] | absFloat(Float) [function, total, smt-hook(fp.abs), hook(FLOAT.abs)] | roundFloat(Float, precision: Int, exponentBits: Int) [function, hook(FLOAT.round)] | floorFloat(Float) [function, total, hook(FLOAT.floor)] | ceilFloat(Float) [function, total, hook(FLOAT.ceil)] | truncFloat(Float) [function, total, hook(FLOAT.trunc)] | expFloat(Float) [function, total, hook(FLOAT.exp)] | logFloat(Float) [function, hook(FLOAT.log)] | sinFloat(Float) [function, total, hook(FLOAT.sin)] | cosFloat(Float) [function, total, hook(FLOAT.cos)] | tanFloat(Float) [function, hook(FLOAT.tan)] | asinFloat(Float) [function, hook(FLOAT.asin)] | acosFloat(Float) [function, hook(FLOAT.acos)] | atanFloat(Float) [function, total, hook(FLOAT.atan)] | atan2Float(Float, Float) [function, hook(FLOAT.atan2)] | maxFloat(Float, Float) [function, smt-hook(fp.max), hook(FLOAT.max)] | minFloat(Float, Float) [function, smt-hook(fp.min), hook(FLOAT.min)] | sqrtFloat(Float) [function] | maxValueFloat(precision: Int, exponentBits: Int) [function, hook(FLOAT.maxValue)] | minValueFloat(precision: Int, exponentBits: Int) [function, hook(FLOAT.minValue)]
Compute whether a float is less than or equasl to, less than, greater than or
equal to, greater than, equal, or unequal to another float. Note that
X ==Float Y
and X ==K Y
might yield different values. The latter should be
used in cases where you want to compare whether two values of sort Float
contain the same term. The former should be used when you want to implement
the ==
operator of a programming language. In particular, NaN =/=Float NaN
is true, because NaN
compares unequal to all values, including itself, in
IEEE 754 arithmetic. 0.0 ==Float -0.0
is also true.
syntax Bool ::= Float "<=Float" Float [function, smt-hook(fp.leq), hook(FLOAT.le)] | Float "<Float" Float [function, smt-hook(fp.lt), hook(FLOAT.lt)] | Float ">=Float" Float [function, smt-hook(fp.geq), hook(FLOAT.ge)] | Float ">Float" Float [function, smt-hook(fg.gt), hook(FLOAT.gt)] | Float "==Float" Float [function, comm, smt-hook(fp.eq), hook(FLOAT.eq), symbol(_==Float_)] | Float "=/=Float" Float [function, comm, smt-hook((not (fp.eq #1 #2)))] rule F1:Float =/=Float F2:Float => notBool (F1 ==Float F2)
You can convert an integer to a floating-point number with the specified
precision and exponent range. You can also convert a floating-point number
to the nearest integer. This operation rounds to the nearest integer, but it
also avoids the double-rounding that is present in ceilFloat
and floorFloat
if the nearest integer is not representable in the specified floating-point
type.
syntax Float ::= Int2Float(Int, precision: Int, exponentBits: Int) [function, hook(FLOAT.int2float)] syntax Int ::= Float2Int(Float) [function, total, hook(FLOAT.float2int)]
The remainder of this section consists of an implementation in K of some of the
operators above.
rule sqrtFloat(F:Float) => rootFloat(F, 2) rule isInfinite(F:Float) => F >Float maxValueFloat(precisionFloat(F), exponentBitsFloat(F)) orBool F <Float --Float maxValueFloat(precisionFloat(F), exponentBitsFloat(F)) endmodule
Provided here is the syntax of an implementation of Unicode strings in K. This
type is hooked to an implementation of strings provided by the backend. The
implementation is currently incomplete and does not fully support encodings
and code points beyond the initial 256 code points of the Basic Latin and
Latin-1 Supplement blocks. In the future, there may be breaking changes to
the semantics of this module in order to support this functionality.
The syntax of strings in K is delineated by double quotes. Inside the double
quotes, any character can appear verbatim except double quotes, backslash,
newline, and carriage return. K also supports the following escape sequences:
module STRING-SYNTAX syntax String [hook(STRING.String)] syntax String ::= r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]" [token] endmodule module STRING-COMMON imports STRING-SYNTAX imports private INT imports private FLOAT-SYNTAX imports private K-EQUAL imports private BOOL
You can concatenate two strings in O(N) time. For successive concatenation
operations, it may be better to use the STRING-BUFFER
module.
syntax String ::= String "+String" String [function, total, left, hook(STRING.concat)]
You can get the length of a string in O(1) time.
syntax Int ::= lengthString ( String ) [function, total, hook(STRING.length)]
You can convert between a character (as represented by a string containing
a single code point) and an integer in O(1) time.
syntax String ::= chrChar ( Int ) [function, hook(STRING.chr)] syntax Int ::= ordChar ( String ) [function, hook(STRING.ord)]
You can compute a substring of a string in O(N) time (where N is the
length of the substring). There are two important facts to note:
startIndex
but excludes theendIndex
, i.e., the range is [startIndex..endIndex)
.startIndex < endIndex
and endIndex
is less than or equal to the stringsyntax String ::= substrString ( String , startIndex: Int , endIndex: Int ) [function, total, hook(STRING.substr)]
You can find the first (respectively, last) occurrence of a substring, starting
at a certain index
, in another string in O(N*M) time.
Returns -1
if the substring is not found.
syntax Int ::= findString ( haystack: String , needle: String , index: Int ) [function, hook(STRING.find)] syntax Int ::= rfindString ( haystack: String , needle: String , index: Int ) [function, hook(STRING.rfind)]
You can find the first (respectively, last) occurrence of one of the characters
of the search string, starting at a certain index
, in another string in
O(N*M) time.
syntax Int ::= findChar ( haystack: String , needles: String , index: Int ) [function, hook(STRING.findChar)] syntax Int ::= rfindChar ( haystack: String , needles: String , index: Int ) [function, hook(STRING.rfindChar)]
syntax String ::= Bool2String(Bool) [function, total] rule Bool2String(true) => "true" rule Bool2String(false) => "false"
syntax Bool ::= String2Bool(String) [function] rule String2Bool("true") => true rule String2Bool("false") => false
You can convert between a String
and a Float
. The String will be
represented in the syntax of the Float
sort (see the section on the FLOAT
module above for details of that syntax). Which particular string is returned
by Float2String
is determined by the backend, but the same Float
is
guaranteed to return the same String
, and converting that String
back to a
Float
is guaranteed to return the original Float
.
You can also convert a Float
to a string in a particular syntax using the
variant of Float2String
with a format
. In this case, the resulting string
is one which results directly from passing that format
to mpfr_printf
. This
functionality may not be supported on backends that do not use Gnu MPFR to
implement floating-point numbers.
syntax String ::= Float2String ( Float ) [function, total, hook(STRING.float2string)] syntax String ::= Float2String ( Float , format: String ) [function, symbol(FloatFormat), hook(STRING.floatFormat)] syntax Float ::= String2Float ( String ) [function, hook(STRING.string2float)]
You can convert between a String
and an Int
. The String will be represented
in the syntax of the INT
module (i.e., a nonempty sequence of digits
optionally prefixed by a sign). When converting from an Int
to a String
,
the sign will not be present unless the integer is negative.
You can also convert between a String
and an Int
in a particular radix.
This radix can be anywhere between 2 and 36. For a radix 2 <= N <= 10, the
digits 0 to N-1 will be used. For a radix 11 <= N <= 36, the digits 0 to 9
and the first N-10 letters of the Latin alphabet will be used. Both uppercase
and lowercase letters are supported by String2Base
. Whether the letters
returned by Base2String
are upper or lowercase is determined by the backend,
but the backend will consistently choose one or the other.
syntax Int ::= String2Int ( String ) [function, hook(STRING.string2int)] syntax String ::= Int2String ( Int ) [function, total, hook(STRING.int2string)] syntax String ::= Base2String ( Int , base: Int ) [function, hook(STRING.base2string)] syntax Int ::= String2Base ( String , base: Int ) [function, hook(STRING.string2base)]
You can replace one, some, or all occurrences of a string within another
string in O(N*M) time. The replaceAll
, replace
, and replaceFirst
methods
are identical, except replaceFirst
replaces exactly one ocurrence of the
string, the first occurrence. replace
replaces the first times
occurrences.
And replaceAll
replaces every occurrence.
You can also count the number of times a string occurs within another string
using countAllOccurrences
.
syntax String ::= "replaceAll" "(" haystack: String "," needle: String "," replacement: String ")" [function, total, hook(STRING.replaceAll)] syntax String ::= "replace" "(" haystack: String "," needle: String "," replacement: String "," times: Int ")" [function, hook(STRING.replace)] syntax String ::= "replaceFirst" "(" haystack: String "," needle: String "," replacement: String ")" [function, total, hook(STRING.replaceFirst)] syntax Int ::= "countAllOccurrences" "(" haystack: String "," needle: String ")" [function, total, hook(STRING.countAllOccurrences)]
You can compare whether two strings are equal or unequal, or whether one string
is less than, less than or equal to, greater than, or greater than or equal to
another according to the natural lexicographic ordering of strings.
syntax Bool ::= String "==String" String [function, total, comm, hook(STRING.eq)] | String "=/=String" String [function, total, comm, hook(STRING.ne)] rule S1:String =/=String S2:String => notBool (S1 ==String S2) syntax Bool ::= String "<String" String [function, total, hook(STRING.lt)] | String "<=String" String [function, total, hook(STRING.le)] | String ">String" String [function, total, hook(STRING.gt)] | String ">=String" String [function, total, hook(STRING.ge)]
What follows is a few String hooks which are deprecated and only are supported
on certain outdated backends of K, as well as an implementation of several
of the above operations in K.
syntax String ::= categoryChar(String) [function, hook(STRING.category)] | directionalityChar(String) [function, hook(STRING.directionality)] syntax String ::= "newUUID" [function, hook(STRING.uuid), impure] rule S1:String <=String S2:String => notBool (S2 <String S1) rule S1:String >String S2:String => S2 <String S1 rule S1:String >=String S2:String => notBool (S1 <String S2) rule findChar(S1:String, S2:String, I:Int) => #if findString(S1, substrString(S2, 0, 1), I) ==Int -1 #then findChar(S1, substrString(S2, 1, lengthString(S2)), I) #else #if findChar(S1, substrString(S2, 1, lengthString(S2)), I) ==Int -1 #then findString(S1, substrString(S2, 0, 1), I) #else minInt(findString(S1, substrString(S2, 0, 1), I), findChar(S1, substrString(S2, 1, lengthString(S2)), I)) #fi #fi requires S2 =/=String "" rule findChar(_, "", _) => -1 rule rfindChar(S1:String, S2:String, I:Int) => maxInt(rfindString(S1, substrString(S2, 0, 1), I), rfindChar(S1, substrString(S2, 1, lengthString(S2)), I)) requires S2 =/=String "" rule rfindChar(_, "", _) => -1 rule countAllOccurrences(Source:String, ToCount:String) => 0 requires findString(Source, ToCount, 0) <Int 0 rule countAllOccurrences(Source:String, ToCount:String) => 1 +Int countAllOccurrences(substrString(Source, findString(Source, ToCount, 0) +Int lengthString(ToCount), lengthString(Source)), ToCount) requires findString(Source, ToCount, 0) >=Int 0 rule replaceFirst(Source:String, ToReplace:String, Replacement:String) => substrString(Source, 0, findString(Source, ToReplace, 0)) +String Replacement +String substrString(Source, findString(Source, ToReplace, 0) +Int lengthString(ToReplace), lengthString(Source)) requires findString(Source, ToReplace, 0) >=Int 0 rule replaceFirst(Source:String, ToReplace:String, _:String) => Source requires findString(Source, ToReplace, 0) <Int 0 // Note that the replace function is undefined when Count < 0. This allows different backends to // implement their own behavior without contradicting these semantics. For instance, a symbolic // backend can return #Bottom for that case, while a concrete backend can throw an exception. rule replace(Source:String, ToReplace:String, Replacement:String, Count:Int) => substrString(Source, 0, findString(Source, ToReplace, 0)) +String Replacement +String replace(substrString(Source, findString(Source, ToReplace, 0) +Int lengthString(ToReplace), lengthString(Source)), ToReplace, Replacement, Count -Int 1) requires Count >Int 0 andBool findString(Source, ToReplace, 0) >=Int 0 rule replace(Source:String, _, _, Count) => Source requires Count >=Int 0 [owise] rule replaceAll(Source:String, ToReplace:String, Replacement:String) => replace(Source, ToReplace, Replacement, countAllOccurrences(Source, ToReplace)) endmodule module STRING-KORE [symbolic] imports private K-EQUAL imports STRING-COMMON rule S1:String ==K S2:String => S1 ==String S2 [simplification] endmodule module STRING imports STRING-COMMON imports STRING-KORE endmodule
It is a well known fact that repeated string concatenations are quadratic
in performance whereas use of an efficient mutable representation of arrays
can yield linear performance. We thus provide such a sort, the StringBuffer
sort. Axiomatically, it is implemented below on symbolic backends using the
String
module. However, on concrete backends it provides an efficient
implementation of string concatenation. There are three operations:
.StringBuffer
creates a new StringBuffer
with current content equal+String
takes a StringBuffer
and a String
and appends the String
toStringBuffer
StringBuffer2String
converts a StringBuffer
to a String
. This operationStringBuffer
String
returned by this function.module STRING-BUFFER-IN-K [symbolic] imports private BASIC-K imports STRING syntax StringBuffer ::= ".StringBuffer" [function, total] syntax StringBuffer ::= StringBuffer "+String" String [function, total, avoid] syntax StringBuffer ::= String syntax String ::= StringBuffer2String ( StringBuffer ) [function, total] rule {SB:String +String S:String}::StringBuffer => (SB +String S)::String rule .StringBuffer => "" rule StringBuffer2String(S:String) => S endmodule module STRING-BUFFER-HOOKED [concrete] imports private BASIC-K imports STRING syntax StringBuffer [hook(BUFFER.StringBuffer)] syntax StringBuffer ::= ".StringBuffer" [function, total, hook(BUFFER.empty), impure] syntax StringBuffer ::= StringBuffer "+String" String [function, total, hook(BUFFER.concat), avoid] syntax String ::= StringBuffer2String ( StringBuffer ) [function, total, hook(BUFFER.toString)] endmodule module STRING-BUFFER imports STRING-BUFFER-HOOKED imports STRING-BUFFER-IN-K endmodule
Provided here is the syntax of an implementation of fixed-width arrays of Bytes
in K. This type is hooked to an implementation of bytes provided by the backend.
On the LLVM backend, it is possible to opt in to a faster, mutable
representation (using the --llvm-mutable-bytes
flag to kompile
) where
multiple references can occur to the same Bytes
object and when one is
modified, the others are also modified. Care should be taken when using this
feature, however, as it is possible to experience divergent behavior with
symbolic backends unless the Bytes
type is used in a manner that preserves
consistency.
module BYTES-SYNTAX imports private STRING-SYNTAX syntax Bytes [hook(BYTES.Bytes)] syntax Bytes ::= r"b[\\\"](([ !#-\\[\\]-~])|([\\\\][tnfr\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2}))*[\\\"]" [token] endmodule
module BYTES-STRING-ENCODE [symbolic] imports BYTES-SYNTAX
You can encode/decode between Bytes and String using UTF-8
, UTF-16LE
, UTF-16BE
, UTF-32LE
, and UTF-32BE
syntax String ::= decodeBytes ( encoding: String , contents: Bytes ) [function, hook(BYTES.decodeBytes)] syntax Bytes ::= encodeBytes ( encoding: String , contents: String ) [function, hook(BYTES.encodeBytes)] endmodule
module BYTES-HOOKED imports STRING-SYNTAX imports BYTES-SYNTAX imports BYTES-STRING-ENCODE
The byte array of length zero is represented by .Bytes
.
syntax Bytes ::= ".Bytes" [function, total, hook(BYTES.empty)]
When converting to/from an integer, byte arrays can be treated as either little
endian (ie, least significant byte first) or big endian (ie, most significant
byte first).
syntax Endianness ::= "LE" [symbol(littleEndianBytes)] | "BE" [symbol(bigEndianBytes)]
When converting to/from an integer, byte arrays can be treated as either signed
or unsigned.
syntax Signedness ::= "Signed" [symbol(signedBytes)] | "Unsigned" [symbol(unsignedBytes)]
You can convert from a Bytes
to an Int
. In order to do this, the endianness
and signedness of the Bytes
must be provided. The resulting integer is
created by means of interpreting the Bytes
as either a twos-complement
representation, or an unsigned representation, of an integer, in the specified
byte order.
You can also convert from an Int
to a Bytes
. This comes in two variants.
In the first, the length
of the resulting Bytes
in bytes is explicitly
specified. If the length
is greater than the highest set bit in the magnitude
of the integer, the result is padded with 0 bits if the number is positive
and 1 bits if the number is negative. If the length
is less than the highest
bit set in the magnitude of the integer, the most-significant bits of the
integer will be truncated. The endianness of the resulting Bytes
object
is as specified.
In the second variant, both endianness and signedness are specified, and
the resulting Bytes
object will be the smallest number of bytes necessary
for the resulting Bytes
object to be convertible back to the original integer
via Bytes2Int
. In other words, if the highest bit set in the magnitude of the
integer is N, then the byte array will be at least N+1 bits long, rounded up
to the nearest byte.
syntax Int ::= Bytes2Int(Bytes, Endianness, Signedness) [function, total, hook(BYTES.bytes2int)] syntax Bytes ::= Int2Bytes(length: Int, Int, Endianness) [function, total, hook(BYTES.int2bytes)] | Int2Bytes(Int, Endianness, Signedness) [function, total, symbol(Int2BytesNoLen)]
You can convert between a Bytes
and a String
in O(N) time. The resulting
value is a copy of the original and will not be affected by subsequent
mutations of the input or output value.
syntax String ::= Bytes2String(Bytes) [function, total, hook(BYTES.bytes2string)] syntax Bytes ::= String2Bytes(String) [function, total, hook(BYTES.string2bytes)]
You can set the value of a particular byte in a Bytes
object in O(1) time.
The result is #False
if value
is not in the range [0..255] or if index
is not a valid index (ie, less than zero or greater than or equal to the length
of the Bytes
term).
syntax Bytes ::= Bytes "[" index: Int "<-" value: Int "]" [function, hook(BYTES.update)]
You can get the value of a particular byte in a Bytes
object in O(1) time.
The result is #False
if index
is not a valid index (see above).
syntax Int ::= Bytes "[" Int "]" [function, hook(BYTES.get)]
You can get a new Bytes
object containing a range of bytes from the input
Bytes
in O(N) time (where N is the length of the substring). The range
of bytes included is [startIndex..endIndex)
. The resulting Bytes
is
a copy and mutations to it do not affect mutations to the original Bytes
.
The result is #False
if startIndex
or endIndex
are not valid.
syntax Bytes ::= substrBytes(Bytes, startIndex: Int, endIndex: Int) [function, hook(BYTES.substr)]
You can modify a Bytes
to return a Bytes
which is equal to dest
except the
N
elements starting at index
are replaced with the contents of src
in O(N)
time. If --llvm-mutable-bytes
is active, this will not create a new Bytes
object and will instead modify the original on concrete backends. The result is
#False
if index
+ N
is not a valid index.
syntax Bytes ::= replaceAtBytes(dest: Bytes, index: Int, src: Bytes) [function, hook(BYTES.replaceAt)]
You can modify a Bytes
to return a Bytes
which is equal to dest
except the
count
bytes starting at index
are replaced with count
bytes of value
Int2Bytes(1, v, LE/BE)
in O(count) time. This does not create a new Bytes
object and will instead modify the original if --llvm-mutable-bytes
is active.
This will throw an exception if index
+ count
is not a valid index. The
acceptable range of values for v
is -128 to 127. This will throw an exception
if v
is outside of this range. This is implemented only for the LLVM backend.
syntax Bytes ::= memsetBytes(dest: Bytes, index: Int, count: Int, v: Int) [function, hook(BYTES.memset)]
You can create a new Bytes
object which is at least length
bytes long by
taking the input sequence and padding it on the right (respectively, on the
left) with the specified value
. If --llvm-mutable-bytes
is active, this does
not create a new Bytes
object if the input is already at least length
bytes
long, and will instead return the input unchanged. The result is #False
if
value
is not in the range [0..255]
, or if the length is negative.
syntax Bytes ::= padRightBytes(Bytes, length: Int, value: Int) [function, hook(BYTES.padRight)] | padLeftBytes(Bytes, length: Int, value: Int) [function, hook(BYTES.padLeft)]
You can reverse a Bytes
object in O(N) time. If --llvm-mutable-bytes
is
active, this will not create a new Bytes
object and will instead modify the
original.
syntax Bytes ::= reverseBytes(Bytes) [function, total, hook(BYTES.reverse)]
You can get the length of a Bytes
term in O(1) time.
syntax Int ::= lengthBytes(Bytes) [function, total, hook(BYTES.length), smtlib(lengthBytes)]
You can create a new Bytes
object by concatenating two Bytes
objects
together in O(N) time.
syntax Bytes ::= Bytes "+Bytes" Bytes [function, total, hook(BYTES.concat), right] endmodule
The remainder of this module consists of an implementation of some of the
operators listed above in K.
module BYTES-CONCRETE [concrete] imports BYTES-HOOKED endmodule module BYTES-KORE imports BYTES-HOOKED imports BYTES-SYMBOLIC-CEIL endmodule module BYTES-SYMBOLIC-CEIL [symbolic] imports BYTES-HOOKED imports private INT imports private BOOL rule #Ceil(padRightBytes(_, LEN, VAL)) => {(0 <=Int LEN andBool 0 <=Int VAL andBool VAL <Int 256) #Equals true} [simplification] rule #Ceil(padLeftBytes(_, LEN, VAL)) => {(0 <=Int LEN andBool 0 <=Int VAL andBool VAL <Int 256) #Equals true} [simplification] endmodule module BYTES imports BYTES-CONCRETE imports BYTES-KORE imports private INT rule Int2Bytes(I::Int, _::Endianness, _) => .Bytes requires I ==Int 0 rule Int2Bytes(I::Int, E::Endianness, Unsigned) => Int2Bytes((log2Int(I) +Int 8) /Int 8, I, E) requires I >Int 0 [preserves-definedness] rule Int2Bytes(I::Int, E::Endianness, Signed ) => Int2Bytes((log2Int(I) +Int 9) /Int 8, I, E) requires I >Int 0 [preserves-definedness] rule Int2Bytes(I::Int, E::Endianness, Signed ) => Int2Bytes((log2Int(~Int I) +Int 9) /Int 8, I, E) requires I <Int -1 [preserves-definedness] rule Int2Bytes(I::Int, E::Endianness, Signed ) => Int2Bytes(1, -1, E) requires I ==Int -1 [preserves-definedness] endmodule
Provided here is an implementation for program identifiers in K. Developers
of semantics for a particular language may wish to use their own implementation
instead of the one provided here if their syntax differs from the syntax
defined below. However, this is provided for convenience for developers who
do not care about the lexical syntax of identifiers.
Provided are the following pieces of functionality:
Id2String
- Convert an Id
to a String
containing its nameString2Id
- Convert a String
to an Id
with the specified namemodule ID-SYNTAX-PROGRAM-PARSING imports BUILTIN-ID-TOKENS syntax Id ::= r"[A-Za-z\\_][A-Za-z0-9\\_]*" [prec(1), token] | #LowerId [token] | #UpperId [token] endmodule module ID-SYNTAX syntax Id [token] endmodule module ID-COMMON imports ID-SYNTAX imports private STRING syntax String ::= Id2String ( Id ) [function, total, hook(STRING.token2string)] syntax Id ::= String2Id (String) [function, total, hook(STRING.string2token)] syntax Id ::= freshId(Int) [freshGenerator, function, total, private] rule freshId(I:Int) => String2Id("_" +String Int2String(I)) endmodule module ID imports ID-COMMON endmodule
Provided here are implementations of two important primitives in K:
==K
- the equality between two terms. Returns true
if they are equalfalse
if they are not equal.#if #then #else #fi
- polymorphic conditional function. If the firsttrue
, the second argument is returned. Otherwise,module K-EQUAL-SYNTAX imports private BOOL imports private BASIC-K syntax Bool ::= left: K "==K" K [function, total, comm, smt-hook(=), hook(KEQUAL.eq), symbol(_==K_), group(equalEqualK)] | K "=/=K" K [function, total, comm, smt-hook(distinct), hook(KEQUAL.ne), symbol(_=/=K_), group(notEqualEqualK)] syntax priority equalEqualK notEqualEqualK > boolOperation mlOp syntax {Sort} Sort ::= "#if" Bool "#then" Sort "#else" Sort "#fi" [function, total, symbol(ite), smt-hook(ite), hook(KEQUAL.ite)] endmodule module K-EQUAL-KORE [symbolic] imports private BOOL imports K-EQUAL-SYNTAX rule K1:Bool ==K K2:Bool => K1 ==Bool K2 [simplification] rule {K1 ==K K2 #Equals true} => {K1 #Equals K2} [simplification] rule {true #Equals K1 ==K K2} => {K1 #Equals K2} [simplification] rule {K1 ==K K2 #Equals false} => #Not({K1 #Equals K2}) [simplification] rule {false #Equals K1 ==K K2} => #Not({K1 #Equals K2}) [simplification] rule {K1 =/=K K2 #Equals true} => #Not({K1 #Equals K2}) [simplification] rule {true #Equals K1 =/=K K2} => #Not({K1 #Equals K2}) [simplification] rule {K1 =/=K K2 #Equals false} => {K1 #Equals K2} [simplification] rule {false #Equals K1 =/=K K2} => {K1 #Equals K2} [simplification] endmodule module K-EQUAL imports private BOOL imports K-EQUAL-SYNTAX imports K-EQUAL-KORE rule K1:K =/=K K2:K => notBool (K1 ==K K2) rule #if C:Bool #then B1::K #else _ #fi => B1 requires C rule #if C:Bool #then _ #else B2::K #fi => B2 requires notBool C endmodule
Provided below are a few miscellaneous, mostly deprecated functions in K.
It is not recommended to use any of them directly as they are largely
unsupported in modern K. There are a few exceptions:
#getenv
- Returns the value of an environment variable#kompiledDirectory
- Returns the path to the current compiled K definition#unparseKORE
- Takes a K term and converts it to a string.module K-REFLECTION imports BASIC-K imports STRING syntax K ::= "#configuration" [function, impure, hook(KREFLECTION.configuration)] syntax String ::= #sort(K) [function, hook(KREFLECTION.sort)] syntax KItem ::= #fresh(String) [function, hook(KREFLECTION.fresh), impure] syntax KItem ::= getsymbol(K) [function, hook(KREFLECTION.getKLabel)] syntax K ::= #getenv(String) [function, impure, hook(KREFLECTION.getenv)] syntax String ::= #kompiledDirectory() [function, hook(KREFLECTION.kompiledDir)] // meaningful only for the purposes of compilation to a binary, otherwise // undefined syntax List ::= #argv() [function, hook(KREFLECTION.argv)] syntax {Sort} String ::= #unparseKORE(Sort) [function, hook(KREFLECTION.printKORE)] syntax IOError ::= "#noParse" "(" String ")" [symbol(#noParse)] endmodule
Concrete execution in K supports I/O operations. This functionality is not
supported during symbolic execution, because symbolic execution must exist
completely free of side-effects, and I/O is an irreducible type of side effect.
However, it is useful in many cases when defining concrete execution to be able
to make reference to I/O operations.
The design of these I/O operations is based on the POSIX standard, for the most
part. For example, the #read
K function maps to the read
POSIX function. We
do not at this time have a higher-level API for I/O, but this may be
implemented at some point in the future.
I/O operations generally return either their result, or an IOError
term
corresponding to the errno
returned by the underlying system call.
module K-IO imports private LIST imports private STRING imports private INT
Aside from EOF, which is returned by #getc
if the file is at end-of-file, all
of the below I/O errors correspond to possible values for errno
after calling
a library function. If the errno
returned is not one of the below errnos
known to K, #unknownIOError
is returned along with the integer errno value.
syntax IOError ::= "#EOF" [symbol(#EOF)] | #unknownIOError(errno: Int) [symbol(#unknownIOError)] | "#E2BIG" [symbol(#E2BIG)] | "#EACCES" [symbol(#EACCES)] | "#EAGAIN" [symbol(#EAGAIN)] | "#EBADF" [symbol(#EBADF)] | "#EBUSY" [symbol(#EBUSY)] | "#ECHILD" [symbol(#ECHILD)] | "#EDEADLK" [symbol(#EDEADLK)] | "#EDOM" [symbol(#EDOM)] | "#EEXIST" [symbol(#EEXIST)] | "#EFAULT" [symbol(#EFAULT)] | "#EFBIG" [symbol(#EFBIG)] | "#EINTR" [symbol(#EINTR)] | "#EINVAL" [symbol(#EINVAL)] | "#EIO" [symbol(#EIO)] | "#EISDIR" [symbol(#EISDIR)] | "#EMFILE" [symbol(#EMFILE)] | "#EMLINK" [symbol(#EMLINK)] | "#ENAMETOOLONG" [symbol(#ENAMETOOLONG)] | "#ENFILE" [symbol(#ENFILE)] | "#ENODEV" [symbol(#ENODEV)] | "#ENOENT" [symbol(#ENOENT)] | "#ENOEXEC" [symbol(#ENOEXEC)] | "#ENOLCK" [symbol(#ENOLCK)] | "#ENOMEM" [symbol(#ENOMEM)] | "#ENOSPC" [symbol(#ENOSPC)] | "#ENOSYS" [symbol(#ENOSYS)] | "#ENOTDIR" [symbol(#ENOTDIR)] | "#ENOTEMPTY" [symbol(#ENOTEMPTY)] | "#ENOTTY" [symbol(#ENOTTY)] | "#ENXIO" [symbol(#ENXIO)] | "#EPERM" [symbol(#EPERM)] | "#EPIPE" [symbol(#EPIPE)] | "#ERANGE" [symbol(#ERANGE)] | "#EROFS" [symbol(#EROFS)] | "#ESPIPE" [symbol(#ESPIPE)] | "#ESRCH" [symbol(#ESRCH)] | "#EXDEV" [symbol(#EXDEV)] | "#EWOULDBLOCK" [symbol(#EWOULDBLOCK)] | "#EINPROGRESS" [symbol(#EINPROGRESS)] | "#EALREADY" [symbol(#EALREADY)] | "#ENOTSOCK" [symbol(#ENOTSOCK)] | "#EDESTADDRREQ" [symbol(#EDESTADDRREQ)] | "#EMSGSIZE" [symbol(#EMSGSIZE)] | "#EPROTOTYPE" [symbol(#EPROTOTYPE)] | "#ENOPROTOOPT" [symbol(#ENOPROTOOPT)] | "#EPROTONOSUPPORT" [symbol(#EPROTONOSUPPORT)] | "#ESOCKTNOSUPPORT" [symbol(#ESOCKTNOSUPPORT)] | "#EOPNOTSUPP" [symbol(#EOPNOTSUPP)] | "#EPFNOSUPPORT" [symbol(#EPFNOSUPPORT)] | "#EAFNOSUPPORT" [symbol(#EAFNOSUPPORT)] | "#EADDRINUSE" [symbol(#EADDRINUSE)] | "#EADDRNOTAVAIL" [symbol(#EADDRNOTAVAIL)] | "#ENETDOWN" [symbol(#ENETDOWN)] | "#ENETUNREACH" [symbol(#ENETUNREACH)] | "#ENETRESET" [symbol(#ENETRESET)] | "#ECONNABORTED" [symbol(#ECONNABORTED)] | "#ECONNRESET" [symbol(#ECONNRESET)] | "#ENOBUFS" [symbol(#ENOBUFS)] | "#EISCONN" [symbol(#EISCONN)] | "#ENOTCONN" [symbol(#ENOTCONN)] | "#ESHUTDOWN" [symbol(#ESHUTDOWN)] | "#ETOOMANYREFS" [symbol(#ETOOMANYREFS)] | "#ETIMEDOUT" [symbol(#ETIMEDOUT)] | "#ECONNREFUSED" [symbol(#ECONNREFUSED)] | "#EHOSTDOWN" [symbol(#EHOSTDOWN)] | "#EHOSTUNREACH" [symbol(#EHOSTUNREACH)] | "#ELOOP" [symbol(#ELOOP)] | "#EOVERFLOW" [symbol(#EOVERFLOW)]
Here we see sorts defined to contain either an Int
or an IOError
, or
either a String
or an IOError
. These sorts are used to implement the
return sort of functions that may succeed, in which case they return a value,
or may fail, in which case their return value indicates an error and the
error indicated is returned via errno
.
syntax IOInt ::= Int | IOError syntax IOString ::= String | IOError
You can open a file in K using #open
. An optional mode indicates the file
open mode, which can have any value allowed by the fopen
function in C.
The returned value is the file descriptor that was opened, or an error.
syntax IOInt ::= "#open" "(" path: String ")" [function] | "#open" "(" path: String "," mode: String ")" [function, hook(IO.open), impure] rule #open(S:String) => #open(S:String, "r+")
You can get the current offset in a file using #tell
. You can also seek
to a particular offset using #seek
or #seekEnd
. #seek
is implemented via
a call to lseek
with the SEEK_SET
whence. #seekEnd
is implemented via a
call to lseek
with the SEEK_END
whence. You can emulate the SEEK_CUR
whence by means of #seek(FD, #tell(FD) +Int Offset)
.
syntax IOInt ::= "#tell" "(" fd: Int ")" [function, hook(IO.tell), impure] syntax K ::= "#seek" "(" fd: Int "," index: Int ")" [function, hook(IO.seek), impure] | "#seekEnd" "(" fd: Int "," fromEnd: Int ")" [function, hook(IO.seekEnd), impure]
You can read a single character from a file using #getc
. #EOF
is returned
if you are at end-of-fie.
You can also read up to length
characters in a file using #read
. The
resulting read characters are returned, which may be fewer characters than
requested. A string of zero length being returned indicates end-of-file.
syntax IOInt ::= "#getc" "(" fd: Int ")" [function, hook(IO.getc), impure] syntax IOString ::= "#read" "(" fd: Int "," length: Int ")" [function, hook(IO.read), impure]
You can write a single character to a file using #putc
. You can also write
a string to a file using #write
. The returned value on success is .K
.
syntax K ::= "#putc" "(" fd: Int "," value: Int ")" [function, hook(IO.putc), impure] | "#write" "(" fd: Int "," value: String ")" [function, hook(IO.write), impure]
You can close a file using #close
. The returned value on success is .K
.
syntax K ::= "#close" "(" fd: Int ")" [function, hook(IO.close), impure]
You can lock or unlock parts of a file using the #lock
and #unlock
functions. The lock starts at the beginning of the file and continues for
endIndex
bytes. Note that Unix systems do not actually prevent locked files
from being read and modified; you will have to lock both sides of a concurrent
access to guarantee exclusivity.
syntax K ::= "#lock" "(" fd: Int "," endIndex: Int ")" [function, hook(IO.lock), impure] | "#unlock" "(" fd: Int "," endIndex: Int ")" [function, hook(IO.unlock), impure]
You can accept a connection on a socket using #accept
, or shut down the
write end of a socket with #shutdownWrite
. Note that facility is not provided
for opening, binding, and listening on sockets. These functions are implemented
in order to support creating stateful request/response servers where the
request loop is implemented using rewriting in K, but the connection
initialization is written in native code and linked into the LLVM backend.
syntax IOInt ::= "#accept" "(" fd: Int ")" [function, hook(IO.accept), impure] syntax K ::= "#shutdownWrite" "(" fd: Int ")" [function, hook(IO.shutdownWrite), impure]
You can get the current time in seconds since midnight UTC on January 1, 1970
using #time
.
syntax Int ::= "#time" "(" ")" [function, hook(IO.time), impure]
Provided here are functions that return the file descriptor for standard input,
standard output, and standard error.
syntax Int ::= "#stdin" [function, total] | "#stdout" [function, total] | "#stderr" [function, total] rule #stdin => 0 rule #stdout => 1 rule #stderr => 2
You can execute a command using the shell using the #system
operator. Care
must be taken to sanitize inputs to this function or security issues may
result. Note that K has no facility for reasoning about logic that happens
outside its process, so any functionality that you wish to be able to formally
reason about in K should not be implemented via the #system
operator.
syntax KItem ::= #system ( String ) [function, hook(IO.system), impure] | "#systemResult" "(" Int /* exit code */ "," String /* stdout */ "," String /* stderr */ ")" [symbol(#systemResult)]
You can get a temporary file and open it atomically using the #mkstemp
operator. The resulting file will be closed and deleted when K rewriting ends.
For more info on the argument to #mkstemp
, see man mkstemp
.
syntax IOFile ::= #mkstemp(template: String) [function, hook(IO.mkstemp), impure] syntax IOFile ::= IOError | "#tempFile" "(" path: String "," fd: Int ")" [symbol(#tempFile)]
You can delete a file using its absolute or relative path using the #remove
operator. It returns .K
on success or an IOError
on failure.
syntax K ::= #remove(path: String) [function, total, hook(IO.remove), impure]
You can log information to disk using the #logToFile
operator. Semantically,
this operator returns .K
. However, it has a side effect that is not reasoned
about which is that value
will be written to a uniquely-identified file
containing name
in its name. The file is only flushed to disk when rewriting
finishes.
syntax K ::= #logToFile(name: String, value: String) [function, total, hook(IO.log), impure, returnsUnit, symbol(#logToFile)]
Strings can also be logged via the logging mechanisms available to the backend.
On the LLVM backend, this just means logging the text to standard error. On the
Haskell backend, a log message of type InfoUserLog is created with the
specified text.
syntax K ::= #log(value: String) [function, total, hook(IO.logString), impure, returnsUnit, symbol(#log)]
Terms can also be logged to standard error in surface syntax, rather than as
KORE using #trace
. This operator has similar semantics to #logToFile
(i.e.
it returns .K
, but prints as an impure side effect). Note that calling
#trace
is equivalent to invoking the kprint
tool for the first term that is
logged, which requires re-parsing the underlying K definition. Subsequent calls
do not incur this overhead again; the definition is cached.
syntax K ::= #trace(value: KItem) [function, total, hook(IO.traceTerm), impure, returnsUnit, symbol(#trace)] | #traceK(value: K) [function, total, hook(IO.traceTerm), impure, returnsUnit, symbol(#traceK)]
Below is an implementation of the stream="stdin"
and stream="stdout"
cell attributes in K. You should not refer to these symbols or modules directly
in your definition. It is provided only so that the K compiler can make use of
it. For more information on how to use this feature, refer to IMP++ in the K
tutorial.
syntax Stream ::= #buffer(K) | #istream(Int) | #parseInput(String, String) | #ostream(Int) endmodule // NOTE: DO NOT DIRECTLY IMPORT *-STREAM MODULES // These stream modules will be automatically instantiated and implicitly imported // into the main module when `stream` attributes appear in configuration cells. // Only `Stream` productions and `[stream]` rules will be imported. // The cell name will be replaced with the one of the main configuration. module STDIN-STREAM imports K-IO imports K-REFLECTION imports LIST imports INT imports BOOL configuration <stdin> ListItem(#buffer($STDIN:String)) ListItem($IO:String) ListItem(#istream(#stdin)) </stdin> // read one character at a time until we read whitespace rule [stdinGetc]: <stdin> ListItem(#parseInput(_:String, Delimiters:String)) ListItem(#buffer(S:String => S +String chrChar({#getc(N)}:>Int))) ListItem("on") ListItem(#istream(N:Int)) </stdin> requires findChar(S, Delimiters, 0) ==Int -1 // [stdin] [stream, priority(200)] // when we reach whitespace, if it parses create a ListItem rule [stdinParseString]: <stdin> (ListItem(#parseInput("String", Delimiters:String)) => ListItem(S)) ListItem(#buffer(S:String => "")) _:List </stdin> requires findChar(S, Delimiters, 0) =/=Int -1 // [stdin] [stream] // a hack: handle the case when we read integers without the help of the IO server rule [stdinParseInt]: <stdin> (ListItem(#parseInput("Int", Delimiters:String)) => ListItem(String2Int(substrString(S, 0, findChar(S, Delimiters, 0))))) ListItem(#buffer(S:String => substrString(S,findChar(S, Delimiters, 0) +Int 1, lengthString(S)))) _:List </stdin> requires findChar(S, Delimiters, 0) =/=Int -1 andBool lengthString(S) >Int 1 // [stdin] [stream] rule [stdinTrim]: <stdin> ListItem(#parseInput(Sort:String, Delimiters:String)) ListItem(#buffer(S:String => substrString(S, 1, lengthString(S)))) _:List </stdin> requires findChar(S, Delimiters, 0) =/=Int -1 andBool Sort =/=String "String" andBool lengthString(S) <=Int 1 // [stdin] [stream] // NOTE: This unblocking rule will be instantiated and inserted carefully // when necessary according to user-defined rules, since otherwise it will // lead to a diverging (i.e., non-terminating) transition system definition. // Currently, it supports only a simple pattern matching on the top of the // input stream cell, e.g., // rule <k> read() => V ... </k> <in> ListItem(V:Int) => .List ... </in> // Non-supported rules that refer to the input stream cell in a sophisticated // way will get stuck in concrete execution mode with real IO enabled (i.e., // under `--io on` option), while they will still work in symbolic execution // mode or concrete execution mode with real IO disabled (i.e., under `--io // off`, `--search`, or `--debug` options). // // TODO: More patterns need to be supported as well. In that case, we need to // have a way to specify such patterns. rule [stdinUnblock]: <stdin> (.List => ListItem(#parseInput(?Sort:String, ?Delimiters:String))) ListItem(#buffer(_:String)) ... </stdin> /* syntax Stream ::= "#noIO" rule ListItem(#buffer(_)) (ListItem(#noIO) ListItem(#istream(_:Int)) => .List) [stdin] */ endmodule module STDOUT-STREAM imports K-IO imports LIST imports STRING configuration <stdout> ListItem(#ostream(#stdout)) ListItem($IO:String) ListItem(#buffer("")) </stdout> //configuration <stderr> ListItem(#ostream(#stderr)) ListItem($IO:String) ListItem(#buffer("")) </stderr> rule [stdoutBufferFloat]: <stdout> ListItem(#ostream(_)) ListItem(_) ListItem(#buffer(Buffer:String => Buffer +String Float2String(F))) (ListItem(F:Float) => .List) _:List </stdout> // [stdout, stderr] [stream, priority(25)] rule [stdoutBufferInt]: <stdout> ListItem(#ostream(_)) ListItem(_) ListItem(#buffer(Buffer:String => Buffer +String Int2String(I))) (ListItem(I:Int) => .List) _:List </stdout> // [stdout, stderr] [stream, priority(25)] rule [stdoutBufferString]: <stdout> ListItem(#ostream(_)) ListItem(_) ListItem(#buffer(Buffer:String => Buffer +String S)) (ListItem(S:String) => .List) _:List </stdout> // [stdout, stderr] [stream, priority(25)] // Send first char from the buffer to the server rule [stdoutWrite]: <stdout> ListItem(#ostream(N:Int => {#write(N, S) ~> N:Int}:>Int)) ListItem("on") ListItem(#buffer(S:String => "")) _:List </stdout> requires S =/=String "" // [stdout, stderr] [stream, priority(30)] /* syntax Stream ::= "#noIO" rule ListItem(#buffer(Buffer:String => Buffer +String Float2String(F))) (ListItem(F:Float) => .List) _:List [stdout, stderr] rule ListItem(#buffer(Buffer:String => Buffer +String Int2String(I))) (ListItem(I:Int) => .List) _:List [stdout, stderr] rule ListItem(#buffer(Buffer:String => Buffer +String S)) (ListItem(S:String) => .List) _:List [stdout, stderr] rule (ListItem(#ostream(_:Int)) ListItem(#noIO) => .List) ListItem(#buffer(_)) _:List [stdout, stderr] */ endmodule
Provided here is an implementation of arbitrarily large fixed-precision binary
integers in K. This type is hooked to an implementation of integers provided
by the backend, and in particular makes use of native machine integers for
certain sizes of integer. For arbitrary-precision integers, see the INT
module above.
The syntax of machine integers in K is the same as arbitrary-precision integers
(i.e., an optional sign followed by a sequence of digits) except that machine
integers always end in a suffix pN
where N
is an integer indicating the
width in bits of the integer. The MInt
sort is parametric, and this is
reflected in the literals. For example, the sort of 0p8
is MInt{8}
.
module MINT-SYNTAX /*@\section{Description} The MInt implements machine integers of arbitrary * bit width represented in 2's complement. */ syntax {Width} MInt{Width} [hook(MINT.MInt)] /*@ Machine integer of bit width and value. */ syntax {Width} MInt{Width} ::= r"[\\+\\-]?[0-9]+[pP][0-9]+" [token, prec(2), hook(MINT.literal)] endmodule module MINT imports MINT-SYNTAX imports private INT imports private BOOL
You can get the number of bits of width in an MInt using bitwidthMInt
.
syntax {Width} Int ::= bitwidthMInt(MInt{Width}) [function, total, hook(MINT.bitwidth)]
You can convert from an MInt
to an Int
using the MInt2Signed
and
MInt2Unsigned
functions. an MInt
does not have a sign; its sign is instead
reflected in how operators interpret its value either as a signed integer or as
an unsigned integer. Thus, you can interpret a MInt
as a signed integer witth
MInt2Signed
, or as an unsigned integer respectively using MInt2Unsigned
.
You can also convert from an Int
to an MInt
using Int2MInt
. Care must
be given to ensure that the sort context where the Int2MInt
operator appears
has the correct bitwidth, as this will influence the width of the resulting
MInt
.
syntax {Width} Int ::= MInt2Signed(MInt{Width}) [function, total, hook(MINT.svalue)] | MInt2Unsigned(MInt{Width}) [function, total, hook(MINT.uvalue), smt-hook(bv2int)] syntax {Width} MInt{Width} ::= Int2MInt(Int) [function, total, hook(MINT.integer), smt-hook(int2bv)]
You can get the minimum and maximum values of a signed or unsigned MInt
with az specified bit width using sminMInt
, smaxMInt
, uminMInt
, and
umaxMInt
.
syntax Int ::= sminMInt(Int) [function] | smaxMInt(Int) [function] | uminMInt(Int) [function] | umaxMInt(Int) [function] rule sminMInt(N:Int) => 0 -Int (1 <<Int (N -Int 1)) rule smaxMInt(N:Int) => (1 <<Int (N -Int 1)) -Int 1 rule uminMInt(_:Int) => 0 rule umaxMInt(N:Int) => (1 <<Int N) -Int 1
You can check whether a specified Int
will be represented in an MInt
with a specified width
without any loss of precision when interpreted as
a signed or unsigned integer using soverflowMInt
and uoverflowMInt
.
syntax Bool ::= soverflowMInt(width: Int, Int) [function] | uoverflowMInt(width: Int, Int) [function] rule soverflowMInt(N:Int, I:Int) => I <Int sminMInt(N) orBool I >Int smaxMInt(N) rule uoverflowMInt(N:Int, I:Int) => I <Int uminMInt(N) orBool I >Int umaxMInt(N)
You can:
~MInt
of an MInt
.--MInt
of an MInt
.*MInt
of two MInt
s./sMInt
of two MInt
s interpreted as signed integers.%sMInt
of two MInt
s interpreted as signed integers./uMInt
of two MInt
s interpreted as unsigned%uMInt
of two MInt
s interpreted as unsigned integers.+MInt
of two MInt
s.-MInt
of two MInt
s.<<MInt
of two MInt
s. The second MInt
is always>>aMInt
of two MInt
s. The secondMInt
is always interpreted as positve.>>lMInt
of two MInt
s. The second MInt
&MInt
of two MInt
s.xorMInt
of two MInt
s.|MInt
of two MInt
s.syntax {Width} MInt{Width} ::= "~MInt" MInt{Width} [function, total, hook(MINT.not), smt-hook(bvnot)] | "--MInt" MInt{Width} [function, total, hook(MINT.neg), smt-hook(bvuminus)] > left: MInt{Width} "*MInt" MInt{Width} [function, total, hook(MINT.mul), smt-hook(bvmul)] | MInt{Width} "/sMInt" MInt{Width} [function, hook(MINT.sdiv), smt-hook(bvsdiv)] | MInt{Width} "%sMInt" MInt{Width} [function, hook(MINT.srem), smt-hook(bvsrem)] | MInt{Width} "/uMInt" MInt{Width} [function, hook(MINT.udiv), smt-hook(bvudiv)] | MInt{Width} "%uMInt" MInt{Width} [function, hook(MINT.urem), smt-hook(bvurem)] > left: MInt{Width} "+MInt" MInt{Width} [function, total, hook(MINT.add), smt-hook(bvadd)] | MInt{Width} "-MInt" MInt{Width} [function, total, hook(MINT.sub), smt-hook(bvsub)] > left: MInt{Width} "<<MInt" MInt{Width} [function, hook(MINT.shl), smt-hook(bvshl)] | MInt{Width} ">>aMInt" MInt{Width} [function, hook(MINT.ashr), smt-hook(bvashr)] | MInt{Width} ">>lMInt" MInt{Width} [function, hook(MINT.lshr), smt-hook(bvlshr)] > left: MInt{Width} "&MInt" MInt{Width} [function, total, hook(MINT.and), smt-hook(bvand)] > left: MInt{Width} "xorMInt" MInt{Width} [function, total, hook(MINT.xor), smt-hook(bvxor)] > left: MInt{Width} "|MInt" MInt{Width} [function, total, hook(MINT.or), smt-hook(bvor)]
You can compute whether one MInt
is less than, less than or equal to, greater
than, or greater than or equal to another MInt
when interpreted as signed
or unsigned integers. You can also compute whether one MInt
is equal to or
unequal to another MInt
.
syntax {Width} Bool ::= MInt{Width} "<sMInt" MInt{Width} [function, total, hook(MINT.slt), smt-hook(bvslt)] | MInt{Width} "<uMInt" MInt{Width} [function, total, hook(MINT.ult), smt-hook(bvult)] | MInt{Width} "<=sMInt" MInt{Width} [function, total, hook(MINT.sle), smt-hook(bvsle)] | MInt{Width} "<=uMInt" MInt{Width} [function, total, hook(MINT.ule), smt-hook(bvule)] | MInt{Width} ">sMInt" MInt{Width} [function, total, hook(MINT.sgt), smt-hook(bvsgt)] | MInt{Width} ">uMInt" MInt{Width} [function, total, hook(MINT.ugt), smt-hook(bvugt)] | MInt{Width} ">=sMInt" MInt{Width} [function, total, hook(MINT.sge), smt-hook(bvsge)] | MInt{Width} ">=uMInt" MInt{Width} [function, total, hook(MINT.uge), smt-hook(bvuge)] | MInt{Width} "==MInt" MInt{Width} [function, total, hook(MINT.eq), smt-hook(=)] | MInt{Width} "=/=MInt" MInt{Width} [function, total, hook(MINT.ne), smt-hook(distinct)]
You can compute the signed minimum sMinMInt
, the signed maximum sMaxMInt
,
the unsigned minimum uMinMInt
, and the unsigned maximum uMaxMInt
of two
MInt
s.
syntax {Width} MInt{Width} ::= sMaxMInt(MInt{Width}, MInt{Width}) [function, total, hook(MINT.smax), smt-hook((ite (bvslt #1 #2) #2 #1))] | sMinMInt(MInt{Width}, MInt{Width}) [function, total, hook(MINT.smin), smt-hook((ite (bvslt #1 #2) #1 #2))] | uMaxMInt(MInt{Width}, MInt{Width}) [function, total, hook(MINT.umax), smt-hook((ite (bvult #1 #2) #2 #1))] | uMinMInt(MInt{Width}, MInt{Width}) [function, total, hook(MINT.umin), smt-hook((ite (bvult #1 #2) #1 #2))]
You can convert an MInt
of one width to another width with roundMInt
.
The resulting MInt
will be truncated starting from the most significant bit
if the resulting width is smaller than the input. The resulting MInt
will be
zero-extended with the same low-order bits if the resulting width is larger
than the input.
syntax {Width1, Width2} MInt{Width1} ::= roundMInt(MInt{Width2}) [function, total, hook(MINT.round)] syntax {Width1, Width2} MInt{Width1} ::= signExtendMInt(MInt{Width2}) [function, total, hook(MINT.sext)]
endmodule
Defined below is a series of modules used to parse inner syntax in K (ie, the
contents of rules, configuration declarations, and contexts).
Much of this file exists in tight correspondence with the K implementation, and
K will not work correctly if it is altered without corresponding changes to the
source code of the K tools.
Users should only import a few modules from this file. In particular, this
includes SORT-K
, BASIC-K
, ML-SYNTAX
, DEFAULT-LAYOUT
,
DEFAULT-CONFIGURATION
, and K-AMBIGUITIES
. The remaining modules should not
be imported by the user; they are used implicitly by the implementation of K.
The SORT-K
module declares the K
sort, and nothing else.
module SORT-K syntax K [hook(K.K)] endmodule
The BASIC-K
module declares the K
, KItem
, and KConfigVar
sorts, and
imports the syntax of matching logic.
module BASIC-K imports ML-SYNTAX imports SORT-BOOL syntax KItem [hook(K.KItem)] syntax K ::= KItem syntax KConfigVar [token] syntax KItem ::= KConfigVar endmodule
Below is defined the abstract syntax of concrete terms in K, the KAST
syntax.
Users should rarely if ever have to refer to this syntax; in general, it
suffices to use concrete syntax in rules, configuration declarations, contexts,
etc.
This syntax is used directly by the K implementation, and exists here as a
reference for the syntax of KAST
, but it should not be imported directly by
the user.
module KSTRING syntax KString ::= r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]" [token] // optionally qualified strings, like in Scala "abc", i"abc", r"a*bc", etc. endmodule module BUILTIN-ID-TOKENS syntax #LowerId ::= r"[a-z][a-zA-Z0-9]*" [prec(2), token] syntax #UpperId ::= r"[A-Z][a-zA-Z0-9]*" [prec(2), token] endmodule module SORT-KBOTT imports SORT-K syntax KBott endmodule module KAST imports BASIC-K imports SORT-KBOTT imports KSTRING imports BUILTIN-ID-TOKENS syntax KBott ::= "#token" "(" KString "," KString ")" [symbol(#KToken)] | "#klabel" "(" KLabel ")" [symbol(#WrappedKLabel)] | KLabel "(" KList ")" [symbol(#KApply)] syntax KItem ::= KBott syntax KLabel ::= r"`(\\\\`|\\\\\\\\|[^`\\\\\\n\\r])+`" [token] | #LowerId [token] | r"[#a-z][a-zA-Z0-9]*" [token, prec(1)] syntax KList ::= K | ".KList" [symbol(#EmptyKList)] | KList "," KList [symbol(#KList), left, assoc, unit(#EmptyKList), prefer] endmodule // To be used when parsing/pretty-printing ground configurations module KSEQ imports KAST imports K-TOP-SORT syntax K ::= ".K" [symbol(#EmptyK)] | "." [symbol(#EmptyK), deprecated, unparseAvoid] syntax K ::= K "~>" K [symbol(#KSequence), left, assoc, unit(#EmptyK)] syntax left #KSequence syntax {Sort} Sort ::= "(" Sort ")" [bracket, group(defaultBracket), applyPriority(1)] endmodule
K provides direct access to the symbols of Matching Logic, while giving them
their own concrete syntax distinct from the syntax of the KORE
intermediate
representation. These symbols are primarily used during symbolic execution.
The LLVM Backend has relatively little understanding of Matching Logic directly
and use of these symbols directly in rules is likely to cause it to crash.
However, these symbols are necessary when providing lemmas and other types of
logical assistance to proofs and symbolic execution in the Haskell Backend.
The correspondance between K symbols and KORE symbols is as follows:
#Top
- \top
#Bottom
- \bottom
#Not
- \not
#Ceil
- \ceil
#Floor
- \floor
#Equals
- \equals
#And
- \and
#Or
- \or
#Implies
- \implies
#Exists
- \exists
#Forall
- \forall
#AG
- allPathGlobally
#wEF
- weakExistsFinally
#wAF
- weakAlwaysFinally
module ML-SYNTAX [not-lr1] imports SORT-K syntax {Sort} Sort ::= "#Top" [symbol(#Top), group(mlUnary)] | "#Bottom" [symbol(#Bottom), group(mlUnary)] | "#Not" "(" Sort ")" [symbol(#Not), mlOp, group(mlUnary, mlOp)] syntax {Sort1, Sort2} Sort2 ::= "#Ceil" "(" Sort1 ")" [symbol(#Ceil), mlOp, group(mlUnary, mlOp)] | "#Floor" "(" Sort1 ")" [symbol(#Floor), mlOp, group(mlUnary, mlOp)] | "{" Sort1 "#Equals" Sort1 "}" [symbol(#Equals), mlOp, group(mlEquals, mlOp), comm, format(%1%i%n%2%d%n%3%i%n%4%d%n%5)] syntax priority mlUnary > mlEquals > mlAnd syntax {Sort} Sort ::= Sort "#And" Sort [symbol(#And), assoc, left, comm, unit(#Top), mlOp, group(mlAnd, mlOp), format(%i%1%d%n%2%n%i%3%d)] > Sort "#Or" Sort [symbol(#Or), assoc, left, comm, unit(#Bottom), mlOp, group(mlOp), format(%i%1%d%n%2%n%i%3%d)] > Sort "#Implies" Sort [symbol(#Implies), mlOp, group(mlImplies, mlOp), format(%i%1%d%n%2%n%i%3%d)] syntax priority mlImplies > mlQuantifier syntax {Sort1, Sort2} Sort2 ::= "#Exists" Sort1 "." Sort2 [symbol(#Exists), mlOp, mlBinder, group(mlQuantifier, mlOp)] | "#Forall" Sort1 "." Sort2 [symbol(#Forall), mlOp, mlBinder, group(mlQuantifier, mlOp)] syntax {Sort} Sort ::= "#AG" "(" Sort ")" [symbol(#AG), mlOp, group(mlOp)] | "#wEF" "(" Sort ")" [symbol(weakExistsFinally), mlOp, group(mlOp)] | "#wAF" "(" Sort ")" [symbol(weakAlwaysFinally), mlOp, group(mlOp)] endmodule
Provided below is the syntax of variables in K. There are four types of
variables in K:
!
. This!X
syntax.?
. They are not required to appear on the@
.There is also a fifth type of "variable", although it is not technically a
variable. This refers to configuration variables, which are used to insert
values into the initial configuration that come from outside the semantics.
The most common of these is the $PGM
variable, which conventionally contains
the program being executed and is placed in the <k>
cell in the configuration
declaration. These "variables" begin with a $
and their values are populated
by the frontend prior to symbolic or concrete execution of a program.
module KVARIABLE-SYNTAX syntax #KVariable endmodule // To be used when parsing/pretty-printing symbolic configurations module KSEQ-SYMBOLIC imports KSEQ imports ML-SYNTAX imports KVARIABLE-SYNTAX syntax #KVariable ::= r"(\\!|\\?|@)?([A-Z][A-Za-z0-9'_]*|_|_[A-Z][A-Za-z0-9'_]*)" [token, prec(1)] | #UpperId [token] syntax KConfigVar ::= r"(\\$)([A-Z][A-Za-z0-9'_]*)" [token] syntax KBott ::= #KVariable syntax KBott ::= KConfigVar endmodule
While the backend treats cells as regular productions like any other, the
frontend provides a significant amount of convenience notation for dealing with
groups of cells, in order to make writing modular definitions easier. As a
result, we need a syntax for groups of cells and for referring to cells within
rules, configuration declarations, and functions.
For historical reasons, the Bag
sort is used to refer to groups of cells.
This may change in a future release. Users can combine cells in any order
by concatenating them together, and can refer to the absence of any cells with
the .Bag
symbol. You can also refer to cells within a function by placing
the cell context symbol, [[ K ]]
at the top of a rule, placing a function
symbol inside, and referring to cells afterwards. This implicitly inserts
a reference to the configuration at the time prior to the currently-applied
rule being applied which can be matched on within the function. Functions with
such context cannot be referred to in the initial configuration, because the
prior configuration does not yet exist.
module KCELLS imports KAST syntax Cell syntax Bag ::= Bag Bag [left, assoc, symbol(#cells), unit(#cells)] | ".Bag" [symbol(#cells)] | ".::Bag" [symbol(#cells)] | Cell syntax Bag ::= "(" Bag ")" [bracket] syntax KItem ::= Bag syntax #RuleBody ::= "[" "[" K "]" "]" Bag [symbol(#withConfig), avoid] syntax non-assoc #withConfig syntax Bag ::= KBott endmodule
Users can also refer to cells in rules. When doing so, an optional ...
can
be placed immediately after the start of the cell or immediately before the
end. In a cell whose contents are commutative, these are equivalent to one
another and are also equivalent to placing ...
in both places. This means
that what is placed in the cell will be combined with the cell contents'
concatenation operator with an unnamed variable. In other words, you match on
some number of elements in the collection and do not care about the rest of
the collection.
In a cell whose contents are not commutative, the ...
operators correspond
to a variable on the respective side of the contents of the cell that the
...
appears. For example, <foo>... L </foo>
, if L
is a list, means
some number of elements followed by L. Note that not all combinations are
supported. Cells whose contents are sort K
can only have ...
appear at the
tail of the cell, and cells whose contents are sort List
can only have ...
appear on at most one side in a single rule.
module RULE-CELLS imports KCELLS imports RULE-LISTS // if this module is imported, the parser automatically // generates, for all productions that have the attribute 'cell' or 'maincell', // a production like below: //syntax Cell ::= "<top>" #OptionalDots K #OptionalDots "</top>" [symbol(<top>)] syntax #OptionalDots ::= "..." [symbol(#dots)] | "" [symbol(#noDots)] syntax Int // this production will be added by the compiler to help handle bang variables, // however, it is valuable to put it here because without this production, it // will not exist at the point in time when rules and claims are parsed, and // as a result it makes it very difficult to write proof claims over fragments // of code that exercise rules containing bang variables. We put it here because // this production will "vanish" after parsing finishes and not be picked up // by the compiler, which is the behavior we want in this case since an actual // production will be generated by the compiler later on. syntax GeneratedCounterCell ::= "<generatedCounter>" Int "</generatedCounter>" [cell, symbol(<generatedCounter>), internal] endmodule
Users can also declare cells in a configuration declaration. This generates a
specific set of productions that is used internally to implement the cell. The
most important of these is the cell itself, and attributes on this production
can be specified in an xml-attribute-like syntax.
You can also use an xml-short-tag-like syntax to compose configuration cells
together which were defined in different modules. However, it is a requirement
that any K definition have at most one fully-composed configuration; thus, all
other configuration declarations must appear composed within another
configuration declaration.
module CONFIG-CELLS imports KCELLS imports RULE-LISTS syntax #CellName ::= r"[a-zA-Z][a-zA-Z0-9\\-]*" [token, prec(1)] | #LowerId [token] | #UpperId [token] syntax Cell ::= "<" #CellName #CellProperties ">" K "</" #CellName ">" [symbol(#configCell)] syntax Cell ::= "<" #CellName "/>" [symbol(#externalCell)] syntax #CellProperties ::= #CellProperty #CellProperties [symbol(#cellPropertyList)] | "" [symbol(#cellPropertyListTerminator)] syntax #CellProperty ::= #CellName "=" KString [symbol(#cellProperty)] endmodule
Rules can have an optional requires clause or an ensures clause. For backwards-
compatibility, you can refer to the requires clause with both the requires
and when
keywords; The latter, however, is deprecated and may be removed in
a future release.
The requires clause specifies the preconditions that must be true in order
for the rule to apply. The ensures clause specifies the information which
becomes true after the rule has applied. It is a requirement that information
present in the ensures
clause refer to existential variables only.
When doing concrete execution, you can think of the requires
clause as a
side-condition. In other words, even if the rule matches, it will not apply
unless the requires
clause, which must be of sort Bool
, evaluates to
true
.
module REQUIRES-ENSURES imports BASIC-K syntax #RuleBody ::= K syntax #RuleContent ::= #RuleBody [symbol("#ruleNoConditions")] | #RuleBody "requires" Bool [symbol("#ruleRequires")] | #RuleBody "ensures" Bool [symbol("#ruleEnsures")] | #RuleBody "requires" Bool "ensures" Bool [symbol("#ruleRequiresEnsures")] endmodule
The below modules are used in various ways as indicators to the implementation
that certain automatically generated syntax should be created by the parser.
These modules should not be imported directly by the user.
module K-TOP-SORT imports SORT-KBOTT syntax KItem ::= KBott syntax {Sort} KItem ::= Sort endmodule module K-BOTTOM-SORT imports SORT-KBOTT syntax KItem ::= KBott syntax {Sort} Sort ::= KBott endmodule module K-SORT-LATTICE imports K-TOP-SORT imports K-BOTTOM-SORT endmodule module AUTO-CASTS // if this module is imported, the parser automatically // generates, for all sorts, productions of the form: // Sort ::= Sort ":Sort" // semantic cast - force the inner term to be `Sort` or a subsort // Sort ::= Sort "::Sort" // strict cast - force the inner term to be exactly `Sort`. Useful for disambiguation // Sort ::= "{" Sort "}" "::Sort" // synonym for strict cast // Sort ::= "{" K "}" ":>Sort" // projection cast. Allows any term to be placed in a context that expects `Sort` // this is part of the mechanism that allows concrete user syntax in K endmodule module AUTO-FOLLOW // if this module is imported, the parser automatically // generates a follow restriction for every terminal which is a prefix // of another terminal. This is useful to prevent ambiguities such as: // syntax K ::= "a" // syntax K ::= "b" // syntax K ::= "ab" // syntax K ::= K K // #parse("ab", "K") // In the above example, the terminal "a" is not allowed to be followed by a "b" // because it would turn the terminal into the terminal "ab". endmodule module PROGRAM-LISTS imports SORT-K // if this module is imported, the parser automatically // replaces the default productions for lists: // Es ::= E "," Es [userList("*"), symbol('_,_)] // | ".Es" [userList("*"), symbol('.Es)] // into a series of productions more suitable for programs: // Es#Terminator ::= "" [symbol('.Es)] // Ne#Es ::= E "," Ne#Es [symbol('_,_)] // | E Es#Terminator [symbol('_,_)] // Es ::= Ne#Es // | Es#Terminator // if the list is * endmodule module RULE-LISTS // if this module is imported, the parser automatically // adds the subsort production to the parsing module only: // Es ::= E [userList("*")] endmodule module RECORD-PRODUCTIONS // if this module is imported, prefix productions of the form // syntax Sort ::= name(Args) // will be able to be parsed with don't-care variables according // to their nonterminal's names endmodule module SORT-PREDICATES // if this module is imported, the Bool sort will be annotated with // syntax Bool ::= isSort(K) [function] // and all sorts will be annotated with // syntax Sort ::= project:Sort(K) [function] endmodule
Certain additional features are available when parsing the contents of rules
and contexts. For more information on each of these, refer to K's
documentation.
module KREWRITE syntax {Sort} Sort ::= Sort "=>" Sort [symbol(#KRewrite)] syntax non-assoc #KRewrite syntax priority #KRewrite > #withConfig endmodule // To be used to parse semantic rules module K imports KSEQ-SYMBOLIC imports REQUIRES-ENSURES imports RECORD-PRODUCTIONS imports SORT-PREDICATES imports K-SORT-LATTICE imports AUTO-CASTS imports AUTO-FOLLOW imports KREWRITE syntax {Sort} Sort ::= Sort "#as" Sort [symbol(#KAs)] // functions that preserve sorts and can therefore have inner rewrites syntax {Sort} Sort ::= "#fun" "(" Sort ")" "(" Sort ")" [symbol(#fun2), prefer] // functions that do not preserve sort and therefore cannot have inner rewrites syntax {Sort1, Sort2} Sort1 ::= "#fun" "(" Sort2 "=>" Sort1 ")" "(" Sort2 ")" [symbol(#fun3)] syntax {Sort1, Sort2} Sort1 ::= "#let" Sort2 "=" Sort2 "#in" Sort1 [symbol(#let)] /*@ Set membership over terms. In addition to equality over concrete patterns, K also supports computing equality between a concrete pattern and a symbolic pattern. This is compiled efficiently down to pattern matching, and can be used by putting a term with unbound variables in the left child of :=K or =/=K. Note that this does not bind variables used on the lhs however (although this may change in the future).*/ syntax Bool ::= left: K ":=K" K [function, total, symbol(_:=K_), group(equalEqualK)] | K ":/=K" K [function, total, symbol(_:/=K_), group(notEqualEqualK)] endmodule // To be used to parse terms in full K module K-TERM imports KSEQ-SYMBOLIC imports RECORD-PRODUCTIONS imports SORT-PREDICATES imports K-SORT-LATTICE imports AUTO-CASTS imports AUTO-FOLLOW imports KREWRITE endmodule
When constructing a scanner for use during parsing, often you wish to ignore
certain types of text, such as whitespace and comments. However, the specific
syntax which each language must ignore is a little different from language
to language, and thus you wish to specify it manually. You can do this by
defining productions of the #Layout
sort. For more information, refer to
K's documentation. However, this module will be implicitly imported if no
productions are declared of sort #Layout
. This module will also be used
for the purposes of parsing K rules. If you wish to declare a language with
no layout productions, simply create a sort declaration for the #Layout
sort
in your code (e.g. syntax #Layout
).
module DEFAULT-LAYOUT syntax #Layout ::= r"(\\/\\*([^\\*]|(\\*+([^\\*\\/])))*\\*+\\/)" // C-style multi-line comments | r"(\\/\\/[^\\n\\r]*)" // C-style single-line comments | r"([\\ \\n\\r\\t])" // Whitespace endmodule
If the user has no configuration declaration in their seamantics, the below
configuration declaration will be implicitly imported.
module DEFAULT-CONFIGURATION imports BASIC-K configuration <k> $PGM:K </k> endmodule
On occasion, it may be desirable to parse a language with an ambiguous grammar
when parsing a program, and perform additional semantic analysis at a later
time in order to resolve the ambiguities. A good example of this is as a
substitute for the lexer hack in parsers of the C
programming language.
The following module contains a declaration for ambiguities in K. Usually,
an ambiguous parse is an error. However, when you use the --gen-glr-parser
flag to kast
, or the --gen-glr-bison-parser
flag to kompile
, ambiguities
instead become instances of the below parametric production, which you can use
regular K rules to disambiguate as necessary.
module K-AMBIGUITIES syntax {Sort} Sort ::= amb(Sort, Sort) [symbol(amb)] endmodule
Another feature of K's Bison parser is the ability to annotate terms parsed
with location information about the file and line where they occurred. For
more information about how to use this, refer to K's documentation. However,
the below module exists to provide a user syntax for the annotations that
are generated by the parser.
module K-LOCATIONS imports STRING-SYNTAX imports INT-SYNTAX // filename, startLine, startCol, endLine, endCol syntax {Sort} Sort ::= #location(Sort, String, Int, Int, Int, Int) [symbol(#location), format(%3)] endmodule
The following files, integral to defining semantics in K, are automatically
required by every definition via this file. This behavior can be disabled
via kompile --no-prelude
, however, semantics will likely break unless
they provide their own versions of these files, which are assumed to exist
by the compiler. There are, however, circumstances where passing this flag is
appropriate, such as if you are manually requiring these files in your
definition, if your definition was automatically condensed into a single file
with kompile -E
, or if you wish to modify the inner syntax of K by providing
your own version of these files with different syntax.
requires "kast.md" requires "domains.md"
The K Foreign Function Interface (FFI) module provides a way to call native
functions directly from a K semantics using the C ABI. It also provides
utilities for allocating and deallocating byte buffers with static addresses
that are suitable for being passed to native code.
It is built off of the underlying libffi library
(https://sourceware.org/libffi/) and is subject to some of the same
limitations as that library. Bear in mind, because this library exposes
a number of unsafe C APIs directly, misuse of the library is likely to lead
to memory corruption in your interpreter and can cause segmentation faults or
corrupted term representations that lead to undefined behavior at runtime.
requires "domains.md" module FFI-SYNTAX imports private LIST
The FFIType sort is used to declare the native C ABI types of operands passed
to the #ffiCall
function. These types roughly correspond to the types
declared in ffi.h
by libffi.
syntax FFIType ::= "#void" [symbol(#ffi_void)] | "#uint8" [symbol(#ffi_uint8)] | "#sint8" [symbol(#ffi_sint8)] | "#uint16" [symbol(#ffi_uint16)] | "#sint16" [symbol(#ffi_sint16)] | "#uint32" [symbol(#ffi_uint32)] | "#sint32" [symbol(#ffi_sint32)] | "#uint64" [symbol(#ffi_uint64)] | "#sint64" [symbol(#ffi_sint64)] | "#float" [symbol(#ffi_float)] | "#double" [symbol(#ffi_double)] | "#uchar" [symbol(#ffi_uchar)] | "#schar" [symbol(#ffi_schar)] | "#ushort" [symbol(#ffi_ushort)] | "#sshort" [symbol(#ffi_sshort)] | "#uint" [symbol(#ffi_uint)] | "#sint" [symbol(#ffi_sint)] | "#ulong" [symbol(#ffi_ulong)] | "#slong" [symbol(#ffi_slong)] | "#longdouble" [symbol(#ffi_longdouble)] | "#pointer" [symbol(#ffi_pointer)] | "#complexfloat" [symbol(#ffi_complexfloat)] | "#complexdouble" [symbol(#ffi_complexdouble)] | "#complexlongdouble" [symbol(#ffi_complexlongdouble)] | "#struct" "(" List ")" [symbol(#ffi_struct)] endmodule module FFI imports FFI-SYNTAX imports private BYTES imports private STRING imports private BOOL imports private LIST imports private INT
The #ffiCall
functions are designed to call a native C ABI function and
return a native result. They come in three variants:
In the first variant, #ffiCall(Address, Args, ArgTypes, ReturnType)
takes
an integer address of a function (which can be obtained from
#functionAddress
), a List
of Bytes
containing the arguments of the
function, a List
of FFIType
s containing the types of the parameters of the
function, and an FFIType
containing the return type of the function, and
returns the return value of the function as a Bytes
.
syntax Bytes ::= "#ffiCall" "(" Int "," List "," List "," FFIType ")" [function, hook(FFI.call)]
In the second variant,
#ffiCall(Address, Args, FixedTypes, VariadicTypes, ReturnType
takes an
integer address of a function, a List
of Bytes
containing the arguments
of the call, a List
of FFIType
s containing the types of the fixed
parameters of the function, a List
of FFIType
s containing the types of the
variadic parameters of the function, and an FFIType
containing the return
type of the function, and returns the return value of the function as a
Bytes
.
syntax Bytes ::= "#ffiCall" "(" Int "," List "," List "," List "," FFIType ")" [function, hook(FFI.call_variadic)]
In the third variant,
#ffiCall(IsVariadic, Address, Args, ArgTypes, NFixed, ReturnType
takes
a boolean indicating whether the function is variadic or not, an integer
address of a function, a List
of Bytes
containing the arguments of the
call, a List
of FFIType
s containing the parameter typess of the call
followed by the types of the variadic arguments of the call, if any, an Int
containing how many of the arguments of the call are fixed or not, and an
FFIType
containing the return type of the function, and returns the return
value of the function as a Bytes
.
syntax Bytes ::= "#ffiCall" "(" Bool "," Int "," List "," List "," Int "," FFIType ")" [function] rule #ffiCall(false, Addr::Int, Args::List, Types::List, _, Ret::FFIType) => #ffiCall(Addr, Args, Types, Ret) rule #ffiCall(true, Addr::Int, Args::List, Types::List, NFixed::Int, Ret::FFIType) => #ffiCall(Addr, Args, range(Types, 0, size(Types) -Int NFixed), range(Types, NFixed, 0), Ret)
The FFI module provides a mechanism to look up any function symbol and return
that function's address.
syntax Int ::= "#functionAddress" "(" String ")" [function, hook(FFI.address)]
Most memory used by the LLVM backend to represent terms is managed
automatically via garbage collection. However, a consequence of this is that
a particular term does not have a fixed address across its entire lifetime
in most cases. Sometimes this is undesirable, especially if you intend for
the address of the memory to be taken by the semantics or if you intend
to pass this memory directly to native code. As a result, the FFI module
exposes the following unsafe APIs for memory management. Note that use of
these APIs leaves the burden of memory management completely on the user,
and thus misuse of these functions can lead to things like use-after-free
and other memory corruption bugs.
#alloc(Key, Size, Align)
will allocate Size
bytes with an alignment
requirement of Align
(which must be a power of two), and return it as a
Bytes
term. The memory is uniquely identified by its key and that key will
be used later to free the memory. The memory is not implicitly freed by garbage
collection; failure to call #free
on the memory at a later date can lead to
memory leaks.
syntax Bytes ::= "#alloc" "(" KItem "," Int "," Int ")" [function, hook(FFI.alloc)]
#addess(B)
will return an Int
representing the address of the first byte of
B, which must be a Bytes
. Unless the Bytes
term was allocated by #alloc
,
the return value is unspecified and may not be the same across multipl
invocations on the same byte buffer. However, it is guaranteed that memory
allocated by #alloc
will have the same address throughout its lifetime.
syntax Int ::= "#address" "(" Bytes ")" [function, hook(FFI.bytes_address)]
#free(Key)
will free the memory of the Bytes
object that was allocated
by a previous call to #alloc
. If Key
was not used in a previous call to
#alloc
, or the memory was already freed, no action is taken. It will generate
undefined behavior if the Bytes
term returned by the previous call to
#alloc
is still referenced by any other term in the configuration or a
currently evaluating rule. The function returns .K
.
syntax K ::= "#free" "(" KItem ")" [function, hook(FFI.free)]
#nativeRead(Addr, Mem)
will read native memory at address Addr
into Mem
,
reading exactly lengthBytes(Mem)
bytes. This will generate undefined behavior
if Addr
does not point to a readable segment of memory at least
lengthBytes(Mem)
bytes long.
syntax K ::= "#nativeRead" "(" Int "," Bytes ")" [function, hook(FFI.read)]
#nativeWrite(Addr, Mem)
will write the contents of Mem
to native memory at
address Addr
. The memory will be read prior to being written, and a write
will only happen if the memory has a different value than the current value of
Mem
. This will generate undefined behavior if Addr
does not point to a
readable segment of memory at least lengthBytes(Mem)
bytes long, or if the
memory at address Addr
has a different value than currently contained in
Mem
, and the memory in question is not writeable.
syntax K ::= "#nativeWrite" "(" Int "," Bytes ")" [function, hook(FFI.write)] endmodule
K provides builtin support for reading/writing to JSON. While the JSON-SYNTAX
module is not precisely the syntax of JSON (utilizing K's syntax for strings,
integers, and floating point numbers rather than the syntax used by JSON),
you can still convert directly to/from the actual syntax of JSON using
the JSON2String
and String2JSON
hooks.
module JSON-SYNTAX imports INT-SYNTAX imports STRING-SYNTAX imports BOOL-SYNTAX imports FLOAT-SYNTAX syntax JSONs ::= List{JSON,","} [symbol(JSONs)] syntax JSONKey ::= String syntax JSON ::= "null" [symbol(JSONnull)] | String | Int | Float | Bool | JSONKey ":" JSON [symbol(JSONEntry)] | "{" JSONs "}" [symbol(JSONObject)] | "[" JSONs "]" [symbol(JSONList)] endmodule
JSON
and String
Given a string written in valid JSON, you can convert it to the JSON
sort with the String2JSON
function. Assuming the user has not extended
the syntax of the JSON
sort with their own constructors, any term of sort
JSON
can also be converted to a String
using the JSON2String
function.
module JSON imports JSON-SYNTAX syntax String ::= JSON2String(JSON) [function, symbol(JSON2String), hook(JSON.json2string)] syntax JSON ::= String2JSON(String) [function, symbol(String2JSON), hook(JSON.string2json)] endmodule
K provides support for arbitrary-precision rational numbers represented as a
quotient between two integers. The sort representing these values is Rat
.
Int
is a subsort of Rat
, and it is guaranteed that any integer will be
represented as an Int
and can be matched as such on the left hand side
of rules. K also supports the usual arithmetic operators over rational numbers.
module RAT-SYNTAX imports INT-SYNTAX imports private BOOL syntax Rat syntax Rat ::= Int
You can:
syntax Rat ::= left: Rat "^Rat" Int [function, total, symbol(_^Rat_), smtlib(ratpow), hook(RAT.pow)] > left: Rat "*Rat" Rat [function, total, symbol(_*Rat_), left, smtlib(ratmul), hook(RAT.mul)] | Rat "/Rat" Rat [function, symbol(_/Rat_), left, smtlib(ratdiv), hook(RAT.div)] > left: Rat "+Rat" Rat [function, total, symbol(_+Rat_), left, smtlib(ratadd), hook(RAT.add)] | Rat "-Rat" Rat [function, total, symbol(_-Rat_), left, smtlib(ratsub), hook(RAT.sub)]
You can determine whether two rational numbers are equal, unequal, or compare
one of less than, less than or equalto, greater than, or greater than or equal
to the other:
syntax Bool ::= Rat "==Rat" Rat [function, total, symbol(_==Rat_), smtlib(rateq), hook(RAT.eq)] | Rat "=/=Rat" Rat [function, total, symbol(_=/=Rat_), smtlib(ratne), hook(RAT.ne)] | Rat ">Rat" Rat [function, total, symbol(_>Rat_), smtlib(ratgt), hook(RAT.gt)] | Rat ">=Rat" Rat [function, total, symbol(_>=Rat_), smtlib(ratge), hook(RAT.ge)] | Rat "<Rat" Rat [function, total, symbol(_<Rat_), smtlib(ratlt), hook(RAT.lt)] | Rat "<=Rat" Rat [function, total, symbol(_<=Rat_), smtlib(ratle), hook(RAT.le)]
You can compute the minimum and maximum of two rational numbers:
syntax Rat ::= minRat(Rat, Rat) [function, total, symbol(minRat), smtlib(ratmin), hook(RAT.min)] | maxRat(Rat, Rat) [function, total, symbol(maxRat), smtlib(ratmax), hook(RAT.max)]
You can convert a rational number to the nearest floating point number that
is representable in a Float
of a specified number of precision and exponent
bits:
syntax Float ::= Rat2Float(Rat, precision: Int, exponentBits: Int) [function] endmodule
The remainder of this file consists of an implementation in K of the
operations listed above. Users of the RAT module should not use any of the
syntax defined in any of these modules.
As a point of reference for users, it is worth noting that rational numbers
are normalized to a canonical form by this module,. with the canonical form
bearing the property that it is either an Int
, or a pair of integers
I /Rat J
such that
I =/=Int 0 andBool J >=Int 2 andBool gcdInt(I, J) ==Int 1
is always true.
module RAT-COMMON imports RAT-SYNTAX // invariant of < I , J >Rat : I =/= 0, J >= 2, and I and J are coprime syntax Rat ::= "<" Int "," Int ">Rat" [format(%2 /Rat %4)] endmodule module RAT-SYMBOLIC [symbolic] imports private RAT-COMMON imports ML-SYNTAX imports private BOOL rule #Ceil(@R1:Rat /Rat @R2:Rat) => {(@R2 =/=Rat 0) #Equals true} #And #Ceil(@R1) #And #Ceil(@R2) [simplification] endmodule module RAT-KORE imports private RAT-COMMON imports private K-EQUAL /* * equalities */ // NOTE: the two rules below may not work correctly in non-kore backends rule R ==Rat S => R ==K S rule R =/=Rat S => R =/=K S endmodule module RAT [private] imports private RAT-COMMON imports public RAT-SYMBOLIC imports public RAT-KORE imports public RAT-SYNTAX imports private INT imports private BOOL /* * arithmetic */ rule < I , I' >Rat +Rat < J , J' >Rat => ((I *Int J') +Int (I' *Int J)) /Rat (I' *Int J') rule I:Int +Rat < J , J' >Rat => ((I *Int J') +Int J) /Rat J' rule < J , J' >Rat +Rat I:Int => I +Rat < J , J' >Rat rule I:Int +Rat J:Int => I +Int J rule < I , I' >Rat *Rat < J , J' >Rat => (I *Int J) /Rat (I' *Int J') rule I:Int *Rat < J , J' >Rat => (I *Int J) /Rat J' rule < J , J' >Rat *Rat I:Int => I *Rat < J , J' >Rat rule I:Int *Rat J:Int => I *Int J rule < I , I' >Rat /Rat < J , J' >Rat => (I *Int J') /Rat (I' *Int J) rule I:Int /Rat < J , J' >Rat => (I *Int J') /Rat J rule < I , I' >Rat /Rat J:Int => I /Rat (I' *Int J) requires J =/=Int 0 rule I:Int /Rat J:Int => makeRat(I, J) requires J =/=Int 0 // derived rule R -Rat S => R +Rat (-1 *Rat S) // normalize syntax Rat ::= makeRat(Int, Int) [function] | makeRat(Int, Int, Int) [function] rule makeRat(0, J) => 0 requires J =/=Int 0 rule makeRat(I, J) => makeRat(I, J, gcdInt(I,J)) requires I =/=Int 0 andBool J =/=Int 0 // makeRat(I, J, D) is defined when I =/= 0, J =/= 0, D > 0, and D = gcd(I,J) rule makeRat(I, J, D) => I /Int D requires J ==Int D // implies J > 0 since D > 0 rule makeRat(I, J, D) => < I /Int D , J /Int D >Rat requires J >Int 0 andBool J =/=Int D rule makeRat(I, J, D) => makeRat(0 -Int I, 0 -Int J, D) requires J <Int 0 // gcdInt(a,b) computes the gcd of |a| and |b|, which is positive. syntax Int ::= gcdInt(Int, Int) [function, public] rule gcdInt(A, 0) => A requires A >Int 0 rule gcdInt(A, 0) => 0 -Int A requires A <Int 0 rule gcdInt(A, B) => gcdInt(B, A %Int B) requires B =/=Int 0 // since |A %Int B| = |A| %Int |B| /* * exponentiation */ rule _ ^Rat 0 => 1 rule 0 ^Rat N => 0 requires N =/=Int 0 rule < I , J >Rat ^Rat N => powRat(< I , J >Rat, N) requires N >Int 0 rule X:Int ^Rat N => X ^Int N requires N >Int 0 rule X ^Rat N => (1 /Rat X) ^Rat (0 -Int N) requires X =/=Rat 0 andBool N <Int 0 // exponentiation by squaring syntax Rat ::= powRat(Rat, Int) [function] // powRat(X, N) is defined when X =/= 0 and N > 0 rule powRat(X, 1) => X rule powRat(X, N) => powRat(X *Rat X, N /Int 2) requires N >Int 1 andBool N %Int 2 ==Int 0 rule powRat(X, N) => powRat(X, N -Int 1) *Rat X requires N >Int 1 andBool N %Int 2 =/=Int 0 /* * inequalities */ rule R >Rat S => R -Rat S >Rat 0 requires S =/=Rat 0 rule < I , _ >Rat >Rat 0 => I >Int 0 rule I:Int >Rat 0 => I >Int 0 // derived rule R >=Rat S => notBool R <Rat S rule R <Rat S => S >Rat R rule R <=Rat S => S >=Rat R rule minRat(R, S) => R requires R <=Rat S rule minRat(R, S) => S requires S <=Rat R rule maxRat(R, S) => R requires R >=Rat S rule maxRat(R, S) => S requires S >=Rat R syntax Float ::= #Rat2Float(Int, Int, Int, Int) [function, hook(FLOAT.rat2float)] rule Rat2Float(Num:Int, Prec:Int, Exp:Int) => #Rat2Float(Num, 1, Prec, Exp) rule Rat2Float(< Num, Dem >Rat, Prec, Exp) => #Rat2Float(Num, Dem, Prec, Exp) endmodule
One of the traditional ways in which functional languages are given operational
semantics is via substitution. In particular, you can view a function as
declaring a particular bound variable, the parameter of the function, as well
as the body of the function, within which both bound and free variables can
occur, and implement the process of beta-reduction (one of the axioms of the
lambda calculus) by means of a substitution operator which is aware of the
difference between free variables and bound variables and prevents variable
capture.
In K this is implemented using two mechanisms: The KVar
sort, and the
binder
attribute.
KVar
SortK introduces a new hooked sort, KVar
, which the substitution operator
(defined below) understands in a particular way. The syntax of KVar
is the
same as for sort Id
in DOMAINS
, but with a different sort name. Similarly,
some of the same operators are defined over KVar
which are defined for Id
,
such as conversion from String
to KVar
and support for the !Var:KVar
syntax.
A KVar
is simply an identifier with special meaning during substitution.
KVar
s must begin with a letter or underscore,
and can be followed by zero or more letters, numbers, or underscores.
module KVAR-SYNTAX-PROGRAM-PARSING imports BUILTIN-ID-TOKENS syntax KVar ::= r"[A-Za-z\\_][A-Za-z0-9\\_]*" [prec(1), token] | #LowerId [token] | #UpperId [token] endmodule module KVAR-SYNTAX syntax KVar [token, hook(KVAR.KVar)] endmodule module KVAR-COMMON imports KVAR-SYNTAX imports private STRING syntax KVar ::= String2KVar (String) [function, total, hook(STRING.string2token)] syntax KVar ::= freshKVar(Int) [freshGenerator, function, total, private] rule freshKVar(I:Int) => String2KVar("_" +String Int2String(I)) endmodule module KVAR imports KVAR-COMMON endmodule
binder
AttributeA production can be given the attribute binder
. Such a production must have
at least two nonterminals. The first nonterminal from left to right must be of
sort KVar
, and contains the bound variable. The last nonterminal from left
to right contains the term that is bound. For example, I could describe lambdas
in the lambda calculus with the production
syntax Val ::= "lambda" KVar "." Exp [binder]
.
K provides a hooked implementation of substitution, currently only implemented
on the Java and LLVM backends. Two variants exist: the first substitutes
a single KVar
for a single KItem
. The second takes a Map
with KVar
keys and KItem
values, and substitutes each element in the map atomically.
Internally, this is implemented in the LLVM backend by a combination of
de Bruijn
indices for bound variables and names for free variables. Free
variables are also sometimes given a unique numeric identifier in order to
prevent capture, and the rewriter will automatically assign unique names to
such identifiers when rewriting finishes. The names assigned will always begin
with the original name of the variable and be followed by a unique integer
suffix. However, the names assigned after rewriting finishes might be different
from the names that would be assigned if rewriting were to halt prematurely,
for example due to krun --depth
.
module SUBSTITUTION imports private MAP imports KVAR syntax {Sort} Sort ::= Sort "[" KItem "/" KItem "]" [function, hook(SUBSTITUTION.substOne), impure] syntax {Sort} Sort ::= Sort "[" Map "]" [function, hook(SUBSTITUTION.substMany), impure] endmodule
Here you will learn how to use the K tool to define languages by means of a series of screencast movies. It is recommended to do these in the indicated order, because K features already discussed in a previous language definition will likely not be rediscussed in latter definitions. The screencasts follow quite closely the structure of the files under the tutorial folder in the K tool distribution. If you'd rather follow the instructions there and do the tutorial exercises yourself, then go back to https://kframework.org and download the K tool, if you have not done it already. Or, you can first watch the screencasts below and then do the exercises, or do them in parallel.
Make sure you watch the K overview video before you do the K tutorial:
Here you will learn how to define a very simple functional language in K and the basics of how to use the K tool. The language is a call-by-value variant of lambda calculus with builtins and mu, and its definition is based on substitution.
Here you will learn how to define a very simple, prototypical textbook C-like imperative language, called IMP, and several new features of the K tool.
Here you will learn how to define constructs which abruptly change the execution control, as well as how to define functional languages using environments and closures. LAMBDA++ extends the LAMBDA language above with a callcc construct.
Here you will learn how to refine configurations, how to generate fresh elements, how to tag syntactic constructs and rules, how to exhaustively search the space of non-deterministic or concurrent program executions, etc. IMP++ extends the IMP language above with increment, blocks and locals, dynamic threads, input/output, and abrupt termination.
Here you will learn how to define various kinds of type systems following various approaches or styles using K.
Here you will learn a few other K features, and better understand how features that you have already seen work.
Here you will learn how to design imperative programming languages using K. SIMPLE is an imperative language with functions, threads, pointers, exceptions, multi-dimensional arrays, etc. We first define an untyped version of SIMPLE, then a typed version. For the typed version, we define both a static and a dynamic semantics.
Here woul will learn how to design object-oriented programming languages using K. KOOL is an object-oriented language that extends SIMPLE with classes and objects. We first define an untyped version of KOOL, then a typed version, with both a dynamic and a static semantics.
H
ere woul will learn how to design functional programming languages using K. FUN is a higher-order functional language with general let, letrec, pattern matching, references, lists, callcc, etc. We first define an untyped version of FUN, then a let-polymorphic type inferencer.
Here you will learn how to design a logic programming language using K.
Go to Youtube mirror, if the above does not work.
Go back to https://kframework.org for further links, the K tool and contact information.
We start by introducing the basic features of K by means of a series
of very simple languages. The objective here is neither to learn those
languages nor to study their underlying paradigm, but simply to learn K.
Here you will learn how to define a very simple language in K and the basics
of how to use the K tool. The language is a variant of call-by-value lambda
calculus and its definition is based on substitution. Specifically, you will
learn the following:
This folder contains several lessons, each adding new features to LAMBDA.
Here we define our first K module, which contains the initial syntax of the
LAMBDA language, and learn how to use the basic K commands.
Let us create an empty working folder, and open a terminal window
(to the left) and an editor window (to the right). We will edit our K
definition in the right window in a file called lambda.k
, and will call
the K tool commands in the left window.
Let us start by defining a K module, containing the syntax of LAMBDA.
K modules are introduced with the keywords module
... endmodule
.
The keyword syntax
adds new productions to the syntax grammar, using a
BNF-like notation.
Terminals are enclosed in double-quotes, like strings.
You can define multiple productions for the same non-terminal in the same
syntax declaration using the |
separator.
Productions can have attributes, which are enclosed in square brackets.
The attribute left
tells the parser that we want the lambda application to be
left associative. For example, a b c d
will then parse as (((a b) c) d)
.
The attribute bracket
tells the parser to not generate a node for the
parenthesis production in the abstract syntax trees associated to programs.
In other words, we want to allow parentheses to be used for grouping, but we
do not want to bother to give them their obvious (ignore) semantics.
In our variant of lambda calculus defined here, identifiers and lambda
abstractions are meant to be irreducible, that is, are meant to be values.
However, so far Val
is just another non-terminal, just like Exp
,
without any semantic meaning. It will get a semantic meaning later.
After we are done typing our definition in the file lambda.k
, we can kompile
it with the command:
kompile lambda.k
If we get no errors then a parser has been generated. This parser will be
called from now on by default by the krun tool. To see whether and how the
parser works, we are going to write some LAMBDA programs and store them in
files with the extension .lambda
.
Let us create a file identity.lambda
, which contains the identity lambda
abstraction:
lambda x . x
Now let us call krun
on identity.lambda
:
krun identity.lambda
Make sure you call the krun
command from the folder containing your language
definition (otherwise type krun --help
to learn how to pass a language
definition as a parameter to krun
). The krun command produces the output:
<k> lambda x . x </k>
If you see such an output it means that your program has been parsed (and then
pretty printed) correctly. If you want to see the internal abstract syntax
tree (AST) representation of the parsed program, which we call the K AST, then
type kast
in the command instead of krun
:
kast identity.lambda
You should normally never need to see this internal representation in your
K definitions, so do not get scared (yes, it is ugly for humans, but it is
very convenient for tools).
Note that krun
placed the program in a <k> ... </k>
cell. In K, computations
happen only in cells. If you do not define a configuration in your definition,
like we did here, then a configuration will be created automatically for you
which contains only one cell, the default k
cell, which holds the program.
Next, let us create a file free-variable-capture.lambda
, which contains an
expression which, in order to execute correctly in a substitution-based
semantics of LAMBDA, the substitution operation needs to avoid
variable-capture:
a (((lambda x.lambda y.x) y) z)
Next, file closed-variable-capture.lambda
shows an expression which also
requires a capture-free substitution, but this expression is closed (that is,
it has no free variables) and all its bound variables are distinct (I believe
this is the smallest such expression):
(lambda z.(z z)) (lambda x.lambda y.(x y))
Finally, the file omega.lambda
contains the classic omega combinator
(or closed expression), which is the smallest expression which loops forever
(not now, but after we define the semantics of LAMBDA):
(lambda x.(x x)) (lambda x.(x x))
Feel free to define and parse several other LAMBDA programs to get a feel for
how the parser works. Parse also some incorrect programs, to see how the
parser generates error messages.
In the next lesson we will see how to define semantic rules that iteratively
rewrite expressions over the defined syntax until they evaluate to a result.
This way, we obtain our first programming language defined using K.
We here learn how to include a predefined module (SUBSTITUTION), how to
use it to define a K rule (the characteristic rule of lambda calculus),
and how to make proper use of variables in rules.
Let us continue our lambda.k
definition started in the previous lesson.
The requires
keyword takes a .k
file containing language features that
are needed for the current definition, which can be found in the
k-distribution/include/kframework/builtin folder. Thus, the command
requires "substitution.k"
says that the subsequent definition of LAMBDA needs the generic substitution,
which is predefined in file substitution.k
under the folder
k-distribution/include/kframework/builtin. Note that substitution can be defined itself in K,
although it uses advanced features that we have not discussed yet in this
tutorial, so it may not be easy to understand now.
Using the imports
keyword, we can now modify LAMBDA to import the module
SUBSTITUTION, which is defined in the required substitution.k
file.
Now we have all the substitution machinery available for our definition.
However, since our substitution is generic, it cannot know which language
constructs bind variables, and what counts as a variable; however, this
information is critical in order to correctly solve the variable capture
problem. Thus, you have to tell the substitution that your lambda construct
is meant to be a binder, and that your Id
terms should be treated as variables
for substitution. The former is done using the attribute binder
.
By default, binder
binds all the variables occurring anywhere in the first
argument of the corresponding syntactic construct within its other arguments;
you can configure which arguments are bound where, but that will be discussed
in subsequent lectures. To tell K which terms are meant to act as variables
for binding and substitution, we have to explicitly subsort the desired syntactic
categories to the builtin KVariable
sort.
Now we are ready to define our first K rule. Rules are introduced with the
keyword rule
and make use of the rewrite symbol, =>
. In our case,
the rule defines the so-called lambda calculus beta-reduction, which
makes use of substitution in its right-hand side, as shown in lambda.k
.
By convention, variables that appear in rules start with a capital letter
(the current implementation of the K tool may even enforce that).
Variables may be explicitly tagged with their syntactic category (also called
sort). If tagged, the matching term will be checked at run-time for
membership to the claimed sort. If not tagged, then no check will be made.
The former is safer, but involves the generation of a side condition to the
rule, so the resulting definition may execute slightly slower overall.
In our rule in lambda.k
we tagged all variables with their sorts, so we chose
the safest path. Only the V
variable really needs to be tagged there,
because we can prove (using other means, not the K tool, as the K tool is not
yet concerned with proving) that the first two variables will always have the
claimed sorts whenever we execute any expression that parses within our
original grammar.
Let us compile the definition and then run some programs. For example,
krun closed-variable-capture.lambda
yields the output
<k> lambda y . ((lambda x . (lambda y . (x y))) y) </k>
Notice that only certain programs reduce (some even yield non-termination,
such as omega.lambda
), while others do not. For example,
free-variable-capture.lambda
does not reduce its second argument expression
to y
, as we would expect. This is because the K rewrite rules between syntactic
terms do not apply anywhere they match. They only apply where they have been
given permission to apply by means of appropriate evaluation strategies of language
constructs, which is done using strictness attributes, evaluation contexts,
heating/cooling rules, etc., as discussed in the next lessons.
The next lesson will show how to add LAMBDA the desired evaluation strategies
using strictness attributes.
Go to Lesson 3, LAMBDA: Evaluation Strategies using Strictness
Here we learn how to use the K strict
attribute to define desired evaluation
strategies. We will also learn how to tell K which terms are already
evaluated, so it does not attempt to evaluate them anymore and treats them
internally as results of computations.
Recall from the previous lecture that the LAMBDA program
free-variable-capture.lambda
was stuck, because K was not given permission
to evaluate the arguments of the lambda application construct.
You can use the attribute strict
to tell K that the corresponding construct
has a strict evaluation strategy, that is, that its arguments need to be
evaluated before the semantics of the construct applies. The order of
argument evaluation is purposely unspecified when using strict
, and indeed
the K tool allows us to detect all possible non-deterministic behaviors that
result from such intended underspecification of evaluation strategies. We will
learn how to do that when we define the IMP language later in this tutorial;
we will also learn how to enforce a particular order of evaluation.
In order for the above strictness declaration to work effectively and
efficiently, we need to tell the K tool which expressions are meant to be
results of computations, so that it will not attempt to evaluate them anymore.
One way to do it is to make Val
a syntactic subcategory of the builtin
KResult
syntactic category. Since we use the same K parser to also parse
the semantics, we use the same syntax
keyword to define additional syntax
needed exclusively for the semantics (like KResult
s). See lambda.k
.
Compile again and then run some programs. They should all work as expected.
In particular, free-variable-capture.lambda
now evaluates to a y
.
We now got a complete and working semantic definition of call-by-value
lambda-calculus. While theoretically correct, our definition is not
easy to use and disseminate. In the next lessons we will learn how to
generate formatted documentation for LAMBDA and how to extend LAMBDA
in order to write human readable and interesting programs.
Go to Lesson 4, LAMBDA: Generating Documentation; Latex Attributes.
In this lesson we learn how to generate formatted documentation from K
language definitions. We also learn how to use Latex attributes to control
the formatting of language constructs, particularly of ones which have a
mathematical flavor and we want to display accordingly.
To enhance readability, we may want to replace the keyword lambda
by the
mathematical lambda symbol in the generated documentation. We can control
the way we display language constructs in the generated documentation
by associating them Latex attributes.
This is actually quite easy. All we have to do is to associate a latex
attribute to the production defining the construct in question, following
the Latex syntax for defining new commands (or macros).
In our case, we associate the attribute latex(\lambda{#1}.{#2})
to the
production declaring the lambda abstraction (recall that in Latex, #n
refers
to the n-th argument of the defined new command).
We will later see, in Lesson 9, that we can add arbitrarily complex Latex
comments and headers to our language definitions, which give us maximum
flexibility in formatting our language definitions.
Now we have a simple programming language, with a nice documentation. However,
it is not easy to write interesting programs in this language. Almost all
programming languages build upon existing data-types and libraries. The K
tool provides a few of these (and you can add more).
In the next lesson we show how we can add builtin integers and Booleans to
LAMBDA, so we can start to evaluate meaningful expressions.
We have already added the builtin identifiers (sort Id
) to LAMBDA expressions,
but those had no operations on them. In this lesson we add integers and
Booleans to LAMBDA, and extend the builtin operations on them into
corresponding operations on LAMBDA expressions. We will also learn how to add
side conditions to rules, to limit the number of instances where they can
apply.
The K tool provides several builtins, which are automatically included in all
definitions. These can be used in the languages that we define, typically by
including them in the desired syntactic categories. You can also define your
own builtins in case the provided ones are not suitable for your language
(e.g., the provided builtin integers and operations on them are arbitrary
precision).
For example, to add integers and Booleans as values to our LAMBDA, we have to
add the productions
syntax Val ::= Int | Bool
Int
and Bool
are the nonterminals that correspond to these builtins.
To make use of these builtins, we have to add some arithmetic operation
constructs to our language. We prefer to use the conventional infix notation
for these, and the usual precedences (i.e., multiplication and division bind
tighter than addition, which binds tighter than relational operators).
Inspired from SDF, we use >
instead of
|
to state that all the previous constructs bind tighter than all the
subsequent ones. See lambda.k
.
The only thing left is to link the LAMBDA arithmetic operations to the
corresponding builtin operations, when their arguments are evaluated.
This can be easily done using trivial rewrite rules, as shown in lambda.k
.
In general, the K tool attempts to uniformly add the corresponding builtin
name as a suffix to all the operations over builtins. For example, the
addition over integers is an infix operation named +Int
.
Compile the new lambda.k
definition and evaluate some simple arithmetic
expressions. For example, if arithmetic.lambda
is (1+2*3)/4 <= 1
, then
krun arithmetic.lambda
yields, as expected, true
. Note that the parser took the desired operation
precedence into account.
Let us now try to evaluate an expression which performs a wrong computation,
namely a division by zero. Consider the expression arithmetic-div-zero.lambda
which is 1/(2/3)
. Since division is strict and 2/3
evaluates to 0
, this
expression reduces to 1/0
, which further reduces to 1 /Int 0
by the rule for
division, which is now stuck (with the current back-end to the K tool).
In fact, depending upon the back-end that we use to execute K definitions and
in particular to evaluate expressions over builtins, 1 /Int 0
can evaluate to
anything. It just happens that the current back-end keeps it as an
irreducible term. Other K back-ends may reduce it to an explicit error
element, or issue a segmentation fault followed by a core dump, or throw an
exception, etc.
To avoid requesting the back-end to perform an illegal operation, we may use a
side condition in the rule of division, to make sure it only applies when the
denominator is non-zero.
Like in other operational formalisms, the role of the K side
conditions is to filter the number of instances of the rule. The notion
of a side condition comes from logics, where a sharp distinction is made
between a side condition (cheap) and a premise (expensive). Premises are
usually resolved using further (expensive) logical derivations, while side
conditions are simple (cheap) conditions over the rule meta-variables within
the underlying mathematical domains (which in K can be extended by the user,
as we will see in future lessons). Regarded as a logic, K derives rewrite
rules from other rewrite rules; therefore, the K side conditions cannot
contain other rewrites in them (using =>
). This contrasts other rewrite
engines, for example Maude, which
allow conditional rules with rewrites in conditions.
The rationale behind this deliberate restriction in K is twofold:
Having builtin arithmetic is useful, but writing programs with just lambda
and arithmetic constructs is still a pain. In the next two lessons we will
add conditional (if_then_else
) and binding (let
and letrec
) constructs,
which will allow us to write nicer programs.
Go to Lesson 6, LAMBDA: Selective Strictness; Anonymous Variables.
We here show how to define selective strictness of language constructs,
that is, how to state that certain language constructs are strict only
in some arguments. We also show how to use anonymous variables.
We next define a conditional if
construct, which takes three arguments,
evaluates only the first one, and then reduces to either the second or the
third, depending on whether the first one evaluated to true or to false.
K allows to define selective strictness using the same strict
attribute,
but passing it a list of numbers. The numbers correspond to the arguments
in which we want the defined construct to be strict. In our case,
syntax Exp ::= "if" Exp "then" Exp "else" Exp [strict(1)]
states that the conditional construct is strict in the first argument.
We can now assume that its first argument will eventually reduce to a value, so
we only write the following two semantic rules:
rule if true then E else _ => E rule if false then _ else E => E
Thus, we assume that the first argument evaluates to either true
or false
.
Note the use of the anonymous variable _
. We use such variables purely for
structural reasons, to state that something is there but we don't care what.
An anonymous variable is therefore completely equivalent to a normal variable
which is unsorted and different from all the other variables in the rule. If
you use _
multiple times in a rule, they will all be considered distinct.
Compile lambda.k
and write and execute some interesting expressions making
use of the conditional construct. For example, the expression
if 2<=1 then 3/0 else 10
evaluates to 10
and will never evaluate 3/0
, thus avoiding an unwanted
division-by-zero.
In the next lesson we will introduce two new language constructs, called
let
and letrec
and conventionally found in functional programming
languages, which will allow us to already write interesting LAMBDA programs.
Go to Lesson 7, LAMBDA: Derived Constructs; Extending Predefined Syntax.
In this lesson we will learn how to define derived language constructs, that
is, ones whose semantics is defined completely in terms of other language
constructs. We will also learn how to add new constructs to predefined
syntactic categories.
When defining a language, we often want certain language constructs to be
defined in terms of other constructs. For example, a let-binding construct
of the form
let x = e in e'
is nothing but syntactic sugar for
(lambda x . e') e
This can be easily achieved with a rule, as shown in lambda.k
.
Compile lambda.k
and write some programs using let
binders.
For example, consider a lets.lambda
program which takes arithmetic.lambda
and replaces each integer by a let-bound variable. It should evaluate to
true
, just like the original arithmetic.lambda
.
Let us now consider a more interesting program, namely one that calculates the
factorial of 10:
let f = lambda x . ( (lambda t . lambda x . (t t x)) (lambda f . lambda x . (if x <= 1 then 1 else (x * (f f (x + -1))))) x ) in (f 10)
This program follows a common technique to define fixed points in untyped
lambda calculus, based on passing a function to itself.
We may not like to define fixed-points following the approach above, because
it requires global changes in the body of the function meant to be recursive,
basically to pass it to itself (f f
in our case above). The approach below
isolates the fixed-point aspect of the function in a so-called fixed-point
combinator, which we call fix
below, and then apply it to the function
defining the body of the factorial, without any changes to it:
let fix = lambda f . ( (lambda x . (f (lambda y . (x x y)))) (lambda x . (f (lambda y . (x x y)))) ) in let f = fix (lambda f . lambda x . (if x <= 1 then 1 else (x * (f (x + -1))))) in (f 10)
Although the above techniques are interesting and powerful (indeed, untyped
lambda calculus is in fact Turing complete), programmers will probably not
like to write programs this way.
We can easily define a more complex derived construct, called letrec
and
conventionally encountered in functional programming languages, whose semantics
captures the fixed-point idea above. In order to keep its definition simple
and intuitive, we define a simplified variant of letrec
, namely one which only
allows to define one recursive one-argument function. See lambda.k
.
There are two interesting observations here.
First, note that we have already in-lined the definition of the fix
combinator in the definition of the factorial, to save one application of the
beta reduction rule (and the involved substitution steps). We could have
in-lined the definition of the remaining let
, too, but we believe that the
current definition is easier to read.
Second, note that we extended the predefined Id
syntactic category with two
new constants, $x
and $y
. The predefined identifiers cannot start with
$
, so programs that will be executed with this semantics cannot possibly
contain the identifiers y. In other words, by adding them to Id they
become indirectly reserved for the semantics. This is indeed desirable,
because any possible uses of xdeclaration in the definition of
letrec`.
Using letrec
, we can now write the factorial program as elegantly as it can
be written in a functional language:
letrec f x = if x <= 1 then 1 else (x * (f (x + -1))) in (f 10)
In the next lesson we will discuss an alternative definition of letrec
, based
on another binder, mu
, specifically designed to define fixed points.
Here we learn how multiple language constructs that bind variables can
coexist. We will also learn about or recall another famous binder besides
lambda
, namely mu
, which can be used to elegantly define all kinds of
interesting fixed-point constructs.
The mu
binder has the same syntax as lambda, except that it replaces
lambda
with mu
.
Since mu
is a binder, in order for substitution to know how to deal with
variable capture in the presence of mu
, we have to tell it that mu
is a
binding construct, same like lambda. We take advantage of being there and
also add mu
its desired latex attribute.
The intuition for
mu x . e
is that it reduces to e
, but each free occurrence of x
in e
behaves
like a pointer that points back to mu x . e
.
With that in mind, let us postpone the definition of mu
and instead redefine
letrec F X = E in E'
as a derived construct, assuming mu
available. The
idea is to simply regard F
as a fixed-point of the function
lambda X . E
that is, to first calculate
mu F . lambda X . E
and then to evaluate E'
where F
is bound to this fixed-point:
let F = mu F . lambda X . E in E'
This new definition of letrec
may still look a bit tricky, particularly
because F
is bound twice, but it is much simpler and cleaner than our
previous definition. Moreover, now it is done in a type-safe manner
(this aspect goes beyond our objective in this tutorial).
Let us now define the semantic rule of mu
.
The semantics of mu
is actually disarmingly simple. We just have to
substitute mu X . E
for each free occurrence of X
in E
:
mu X . E => E[(mu X . E) / X]
Compile lambda.k
and execute some recursive programs. They should be now
several times faster. Write a few more recursive programs, for example ones
for calculating the Ackermann function, for calculating the number of moves
needed to solve the Hanoi tower problem, etc.
We have defined our first programming language in K, which allows us to
write interesting functional programs. In the next lesson we will learn how
to fully document our language definition, in order to disseminate it, to ship
it to colleagues or friends, to publish it, to teach it, and so on.
Go to Lesson 9, LAMBDA: A Complete and Commented Definition.
In this lesson you will learn how to add formal comments to your K definition,
in order to nicely document it. The generated document can be then used for
various purposes: to ease understanding the K definition, to publish it,
to send it to others, etc.
The K tool allows a literate programming style, where the executable
language definition can be documented by means of annotations. One such
annotation is the latex(_)
annotation, where you can specify how to format
the given production when producing Latex output via the --output latex
option to krun
, kast
, and kprove
.
There are three types of comments, which we discuss next.
These use //
or /* ... */
, like in various programming languages. These
comments are completely ignored.
Use the @
symbol right after //
or /*
in order for the comment to be
considered an annotation and thus be processed by the K tool when it
generates documentation.
As an example, we can go ahead and add such an annotation at the beginning
of the LAMBDA module, explaining how we define the syntax of this language.
Use the !
symbol right after //
or /*
if you want the comment to be
considered a header annotation, that is, one which goes before
\begin{document}
in the generated Latex. You typically need header
annotations to include macros, or to define a title, etc.
As an example, let us set a Latex length and then add a title and an
author to this K definition.
Compile the documentation and take a look at the results. Notice the title.
Feel free to now add lots of annotations to lambda.k
.
Then compile and check the result. Depending on your PDF viewer, you
may also see a nice click-able table of contents, with all the sections
of your document. This could be quite convenient when you define large
languages, because it helps you jump to any part of the semantics.
Tutorial 1 is now complete. The next tutorial will take us through the
definition of a simple imperative language and will expose us to more
feature of the K framework and the K tool.
Here you will learn how to define a very simple imperative language in K
and the basics of how to work with configurations, cells, and computations.
Specifically, you will learn the following:
Like in the previous tutorial, this folder contains several lessons, each
adding new features to IMP. Do them in order. Also, make sure you completed
and understood the previous tutorial.
Here we learn how to define a more complex language syntax than LAMBDA's,
namely the C-like syntax of IMP. Also, we will learn how to define languages
using multiple modules, because we are going to separate IMP's syntax from
its semantics using modules. Finally, we will also learn how to use K's
builtin support for syntactic lists.
The K tool provides modules for grouping language features. In general, we
can organize our languages in arbitrarily complex module structures.
While there are no rigid requirements or even guidelines for how to group
language features in modules, we often separate the language syntax from the
language semantics in different modules.
In our case here, we start by defining two modules, IMP-SYNTAX and IMP, and
import the first in the second, using the keyword imports
. As their names
suggest, we will place all IMP's syntax definition in IMP-SYNTAX and all its
semantics in IMP.
Note, however, that K does no more than simply includes all the
contents of the imported module in the one which imports it (making sure
that everything is only kept once, even if you import it multiple times).
In other words, there is currently nothing fancy in K tool's module system.
IMP has six syntactic categories, as shown in imp.k
: AExp
for arithmetic
expressions, BExp
for Boolean expressions, Block
for blocks, Stmt
for
statements, Pgm
for programs and Ids
for comma-separated lists of
identifiers. Blocks are special statements, whose role is to syntactically
constrain the conditional statement and the while loop statement to only
take blocks as branches and body, respectively.
There is nothing special about arithmetic and Boolean expressions. They
are given the expected strictness attributes, except for <=
and &&
,
for demonstration purposes.
The <=
is defined to be seqstrict
, which means that it evaluates its
arguments in order, from left-to-right (recall that the strict
operators
can evaluate their arguments in any, fully interleaved, orders). Like
strict
, the seqstrict
annotation can also be configured; for example, one
can specify in which arguments and in what order. By default, seqstrict
refers to all the arguments, in their left-to-right order. In our case here,
it is equivalent with seqstrict(1 2)
.
The &&
is only strict in its first argument, because we will give it a
short-circuited semantics (its second argument will only be evaluated when
the first evaluates to true). Recall the K tool also allows us to associate
LaTex attributes to constructs, telling the document generator how to display
them. For example, we associate <=
the attribute latex({#1}\leq{#2})
,
which makes it be displayed everywhere in the generated LaTex
documentation.
In this tutorial we take the freedom to associate the various constructs
parsing precedences that we have already tested and we know work well, so that
we can focus on the semantics here instead of syntax. In practice, though,
you typically need to experiment with precedences until you obtain the desired
parser.
Blocks are defined using curly brackets, and they can either be empty or
hold a statement.
Nothing special about the IMP statements. Note that ;
is an assignment
statement terminator, not a statement separator. Note also that blocks are
special statements.
An IMP program declares a comma-separated list of variables using the keyword
int
like in C, followed by a semicolon ;
, followed by a statement.
Syntactically, the idea here is that we can wrap any IMP program within a
main(){...}
function and get a valid C program. IMP does not allow variable
declarations anywhere else except through this construct, at the top-level of
the program. Other languages provided with the K distribution (see, e.g., the
IMP++ language also discussed in this tutorial) remove this top-level program
construct of IMP and add instead variable declaration as a statement construct,
which can be used anywhere in the program, not only at the top level.
Note how we defined the comma-separated list of identifiers using
List{Id,","}
. The K tool provides builtin support for generic syntactic
lists. In general,
syntax B ::= List{A,T}
declares a new non-terminal, B
, corresponding to T
-separated sequences of
elements of A
, where A
is a non-terminal and T
is a terminal. These
lists can also be empty, that is, IMP programs declaring no variable are also
allowed (e.g., int; {}
is a valid IMP program). To instantiate and use
the K builtin lists, you should alias each instance with a (typically fresh)
non-terminal in your syntax, like we do with the Ids
nonterminal.
Like with other K features, there are ways to configure the syntactic lists,
but we do not discuss them here.
Recall from Tutorial 1 (LAMBDA) that in order for strictness to work well
we also need to tell K which computations are meant to be results. We do
this as well now, in the module IMP: integers and Booleans are K results.
Kompile imp.k
and test the generated parser by running some programs.
Since IMP is a fragment of C, you may want to select the C mode in your
editor when writing these programs. This will also give your the feel that
you are writing programs in a real programming language.
For example, here is sum.imp
, which sums in sum
all numbers up to n
:
int n, sum; n = 100; sum=0; while (!(n <= 0)) { sum = sum + n; n = n + -1; }
Now krun it and see how it looks parsed in the default k
cell.
The program collatz.imp
tests the Collatz conjecture for all numbers up to
m
and accumulates the total number of steps in s
:
int m, n, q, r, s; m = 10; while (!(m<=2)) { n = m; m = m + -1; while (!(n<=1)) { s = s+1; q = n/2; r = q+q+1; if (r<=n) { n = n+n+n+1; // n becomes 3*n+1 if odd } else {n=q;} // of n/2 if even } }
Finally, program primes.imp
counts in s
all the prime numbers up to m
:
int i, m, n, q, r, s, t, x, y, z; m = 10; n = 2; while (n <= m) { // checking primality of n and writing t to 1 or 0 i = 2; q = n/i; t = 1; while (i<=q && 1<=t) { x = i; y = q; // fast multiplication (base 2) algorithm z = 0; while (!(x <= 0)) { q = x/2; r = q+q+1; if (r <= x) { z = z+y; } else {} x = q; y = y+y; } // end fast multiplication if (n <= z) { t = 0; } else { i = i+1; q = n/i; } } // end checking primality if (1 <= t) { s = s+1; } else {} n = n+1; }
All the programs above will run once we define the semantics of IMP. If you
want to execute them now, wrap them in a main(){...}
function and compile
them and run them with your favorite C compiler.
Before we move to the K semantics of IMP, we would like to make some
clarifications regarding the K builtin parser, kast
. Although it is quite
powerful, you should not expect magic from it! While the K parser can parse
many non-trivial languages (see, for example, the KOOL language in
pl-tutorial/2_languages) in the K distribution), it was
never meant to be a substitute for real parsers. We often call the syntax
defined in K the syntax of the semantics, to highlight the fact that its
role is to serve as a convenient notation when writing the semantics, not
necessarily as a means to define concrete syntax of arbitrarily complex
programming languages. See the KERNELC language for an example on how to connect an external parser for concrete syntax to
the K tool.
The above being said, we strongly encourage you to strive to make the
builtin parser work with your desired language syntax! Do not give up
simply because you don't want to deal with syntactic problems. On the
contrary, fight for your syntax! If you really cannot define your desired
syntax because of tool limitations, we would like to know. Please tell us.
Until now we have only seen default configurations. In the next lesson we
will learn how to define a K custom configuration.
Here we learn how to define a configuration in K. We also learn how to
initialize and how to display it.
As explained in the overview presentation on K, configurations are quite
important, because all semantic rules match and apply on them.
Moreover, they are the backbone of configuration abstraction, which allows
you to only mention the relevant cells in each semantic rule, the rest of
the configuration context being inferred automatically. The importance of
configuration abstraction will become clear when we define more complex
languages (even in IMP++). IMP does not really need it. K configurations
are constructed making use of cells, which are labeled and can be arbitrarily
nested.
Configurations are defined with the keyword configuration
. Cells are
defined using an XML-ish notation stating clearly where the cell starts
and where it ends.
While not enforced by the tool, we typically like to put the entire
configuration in a top-level cell, called T
. So let's define it:
configuration <T>...</T>
Cells can have other cells inside. In our case of IMP, we need a cell to
hold the remaining program, cell which we typically call k
, and a cell to
hold the program state. Let us add them:
configuration <T> <k>...</k> <state>...</state> </T>
K allows us to also specify how to initialize a configuration at the same
time with declaring the configuration. All we have to do is to fill in
the contents of the cells with some terms. The syntactic categories of
those terms will also indirectly define the types of the corresponding
cells.
For example, we want the k
cell to initially hold the program that is passed
to krun
. K provides a builtin configuration variable, called $PGM
, which
is specifically designed for this purpose: krun
will place its program there
(after it parses it, or course). The K tool allows users to define their own
configuration variables, too, which can be used to develop custom
initializations of program configurations with the help of krun
; this can be
quite useful when defining complex languages, but we do not discuss it in
this tutorial.
configuration <T> <k> $PGM </k> <state>...</state> </T>
Moreover, we want the program to be a proper Pgm
term (because we do not
want to allow krun
to take fragments of programs, for example, statements).
Therefore, we tag $PGM
with the desired syntactic category, Pgm
:
configuration <T> <k> $PGM:Pgm </k> <state>...</state> </T>
Like for other variable tags in K, a run-time check will be performed and the
semantics will get stuck if the passed term is not a well-formed program.
We next tell K that the state cell should be initialized with the empty map:
configuration <T> <k> $PGM:Pgm </k> <state> .Map </state> </T>
Recall that in K .
stands for nothing. However, since there are various
types of nothing, to avoid confusion we can suffix the .
with its desired
type. K has several builtin data-types, including lists, sets, bags, and
maps. .Map
is the empty map.
Kompile imp.k
and run several programs to see how the configuration is
initialized as desired.
When configurations get large, and they do when defining large programming
languages, you may want to color the cells in order to more easily distinguish
them. This can be easily achieved using the color
cell attribute, following
again an XML-ish style:
configuration <T color="yellow"> <k color="green"> $PGM:Pgm </k> <state color="red"> .Map </state> </T>
In the next lesson we will learn how to write rules that involve cells.
Go to Lesson 3, IMP: Computations, Results, Strictness; Rules Involving Cells.
In this lesson we will learn about the syntactic category K
of computations,
about how strictness attributes are in fact syntactic sugar for rewrite rules
over computations, and why it is important to tell the tool which
computations are results. We will also see a K rule that involves cells.
Computation structures, or more simply computations, extend the abstract
syntax of your language with a list structure using ~>
(read followed
by or and then, and written in Latex) as a separator.
K provides a distinguished sort, K
, for computations. The extension of the
abstract syntax of your language into computations is done automatically by
the K tool when you declare constructs using the syntax
keyword, so the K
semantic rules can uniformly operate only on terms of sort K
. The intuition
for computation structures of the form
t1 ~> t2 ~> ... ~> tn
is that the listed tasks are to be processed in order. The initial
computation typically contains the original program as its sole task, but
rules can then modify it into task sequences, as seen shortly.
The strictness attributes, used as annotations to language constructs,
actually correspond to rules over computations. For example, the
strict(2)
attribute of the assignment statement corresponds to the
following two opposite rules (X
ranges over Id
and A
over AExp
):
X=A; => A ~> X=[]; A ~> X=[]; => X=A;
The first rule pulls A
from the syntactic context X=A;
and schedules it
for processing. The second rule plugs A
back into its context.
Inspired from the chemical abstract machine, we call rules of the first
type above heating rules and rules of the second type cooling rules.
Similar rules are generated for other arguments in which operations are
strict. Iterative applications of heating rules eventually bring to the
top of the computation atomic tasks, such as a variable lookup, or a
builtin operation, which then make computational progress by means of other
rules. Once progress is made, cooling rules can iteratively plug the result
back into context, so that heating rules can pick another candidate for
reduction, and so on and so forth.
When operations are strict only in some of their arguments, the corresponding
positions of the arguments in which they are strict are explicitly enumerated
in the argument of the strict
attribute, e.g., strict(2)
like above, or
strict(2 3)
for an operation strict in its second and third arguments, etc.
If an operation is simply declared strict
then it means that it is strict
in all its arguments. For example, the strictness of addition yields:
A1+A2 => A1 ~> []+A2 A1 ~> []+A2 => A1+A2 A1+A2 => A2 ~> A1+[] A2 ~> A1+[] => A1+A2
It can be seen that such heating/cooling rules can easily lead to
non-determinism, since the same term may be heated many different ways;
these different evaluation orders may lead to different behaviors in some
languages (not in IMP, because its expressions do not have side effects,
but we will experiment with non-determinism in its successor, IMP++).
A similar desugaring applies to sequential strictness, declared with the
keyword seqstrict
. While the order of arguments of strict
is irrelevant,
it matters in the case of seqstrict
: they are to be evaluated in the
specified order; if no arguments are given, then they are assumed by default
to be evaluated from left-to-right. For example, the default heating/cooling
rules associated to the sequentially strict <=
construct above are
(A1
, A2
range over AExp
and I1
over Int
):
A1<=A2 => A1 ~> []<=A2 A1 ~> []<=A2 => A1<=A2 I1<=A2 => A2 ~> I1<=[] A2 ~> I1<=[] => I1<=A2
In other words, A2
is only heated/cooled after A1
is already evaluated.
While the heating/cooling rules give us a nice and uniform means to define
all the various allowable ways in which a program can evaluate, all based
on rewriting, the fact that they are reversible comes with a serious practical
problem: they make the K definitions unexecutable, because they lead to
non-termination.
To break the reversibility of the theoretical heating/cooling rules, and,
moreover, to efficiently execute K definitions, the current implementation of
the K tool relies on users giving explicit definitions of their languages'
results.
The K tool provides a predicate isKResult
, which is automatically defined
as we add syntactic constructs to KResult
(in fact the K tool defines such
predicates for all syntactic categories, which are used, for example, as
rule side conditions to check user-declared variable memberships, such as
V:Val
stating that V
belongs to Val
).
The kompile
tool, depending upon what it is requested to do, changes the
reversible heating/cooling rules corresponding to evaluation strategy
definitions (e.g., those corresponding to strictness attributes) to avoid
non-termination. For example, when one is interested in obtaining an
executable model of the language (which is the default compilation mode of
kompile
), then heating is performed only when the to-be-pulled syntactic
fragment is not a result, and the corresponding cooling only when the
to-be-plugged fragment is a result. In this case, e.g., the heating/cooling
rules for assignment are modified as follows:
X=A; => A ~> X=[]; requires notBool isKResult(A) A ~> X=[]; => X=A; requires isKResult(A)
Note that non-termination of heating/cooling is avoided now. The only thing
lost is the number of possible behaviors that a program can manifest, but
this is irrelevant when all we want is one behavior.
As will be discussed in the IMP++ tutorial, the heating/cooling rules are
modified differently by kompile
when we are interested in other aspects
of the language definition, such us, for example, in a search-able model that
comprises all program behaviors. This latter model is obviously more general
from a theoretical perspective, but, in practice, it is also slower to execute.
The kompile
tool strives to give you the best model of the language for the
task you are interested in.
This is a long story, but the short answer is: No!. Maybe in some cases
it is possible, but we prefer to not attempt it in the K tool. For example,
you most likely do not want any stuck computation to count as a result,
since some of them can happen simply because you forgot a semantic rule that
could have further reduce it! Besides, in our experience with defining large
languages, it is quite useful to take your time and think of what the results
of your language's computations are. This fact in itself may help you improve
your overall language design. We typically do it at the same time with
defining the evaluation strategies of our languages. Although in theory K
could infer the results of your language as the stuck computations, based on
the above we have deliberately decided to not provide this feature, in spite
of requests from some users. So you currently do have to explicitly define
your K results if you want to effectively use the K tool. Note, however, that
theoretical definitions, not meant to be executed, need not worry about
defining results (that's because in theory semantic rules apply modulo the
reversible heating/cooling rules, so results are not necessary).
All our K rules so far in the tutorial were of the form
rule left => right requires condition
where left
and right
were syntactic, or more generally computation, terms.
Here is our first K rule explicitly involving cells:
rule <k> X:Id => I ...</k> <state>... X |-> I ...</state>
Recall that the k
cell holds computations, which are sequences of tasks
separated by ~>
. Also, the state
cell holds a map, which is a set of
bindings, each binding being a pair of computations (currently, the
K builtin data-structures, like maps, are untyped; or, said differently,
they are all over the type of computations, K
).
Therefore, the two cells mentioned in the rule above hold collections
of things, ordered or not. The ...
s, which we also call cell frames,
stand for more stuff there, which we do not care about.
The rewrite relation =>
is allowed in K to appear anywhere in a term, its
meaning being that the corresponding subterm is rewritten as indicated in the
shown context. We say that K's rewriting is local.
The rule above says that if the identifier X
is the first task in the k
cell, and if X
is bound to I
somewhere in the state
, then X
rewrites
to I
locally in the k
cell. Therefore, IMP variables need to be already
declared when looked up.
Of course, the K rule above can be translated into an ordinary rewrite rule
of the form
rule <k> X ~> Rest </k> <state> Before (X |-> I) After </state> => <k> I ~> Rest </k> <state> Before (X |-> I) After </state>
Besides being more verbose and thus tedious to write, this ordinary rule
is also more error-prone; for example, we may forget the Rest
variable
in the right-hand-side, etc. Moreover, the concurrent semantics of K
allows for its rules to be interpreted as concurrent transactions, where
the context is the read-only component of the transaction, while the
subterms which are rewritten are read/write component of the transaction;
thus, K rule instances can apply concurrently if they only overlap
on read-only parts, while they cannot if regarded as ordinary rewrite logic
rules. Note: our current implementation of the K tool is not concurrent,
so K rules are in fact desugared as normal rewrite rules in the K tool.
Kompile imp.k
using a documentation option and check out how the K rule
looks in the generated document. The ...
frames are displayed as cell
tears, metaphorically implying that those parts of the cells that we
do not care about are torn away. The rewrite relation is replaced by a
horizontal line: specifically, the subterm which rewrites, X
, is
underlined, and its replacement is written underneath the line.
In the next lesson we define the complete K semantics of IMP and
run the programs we parsed in the first lesson.
Go to Lesson 4, IMP: Configuration Abstraction, Part 1; Types of Rules.
Here we will complete the K definition of IMP and, while doing so, we will
learn the very first step of what we call configuration abstraction.
Let us add the remaining rules, in the order in which the language constructs
were defined in IMP-SYNTAX.
The rules for the arithmetic and Boolean constructs are self-explanatory.
Note, however, that K will infer the correct sorts of all the variables in
these rules, because they appear as arguments of the builtin operations
(_+Int_
, etc.). Moreover, the inferred sorts will be enforced dynamically.
Indeed, we do not want to apply the rule for addition, for example, when the
two arguments are not integers. In the rules for &&
, although we prefer to
not do it here for simplicity, we could have eliminated the dynamic check by
replacing B
(and similarly for _
) with B:K
. Indeed, it can be shown
that whenever any of these rules apply, B
(or _
) is a BExp
anyway.
That's because there is no rule that can touch such a B
(or _
); this
will become clearer shortly, when we discuss the first step of configuration
abstraction. Therefore, since we know that B
will be a BExp
anyway, we
could save the time it takes to check its sort; such times may look minor,
but they accumulate, so some designers may prefer to avoid run-time checks
whenever possible.
The block rules are trivial. However, the rule for non-empty blocks is
semantically correct only because we do not have local variable declarations
in IMP. We will have to change this rule in IMP++.
The assignment rule has two =>
: one in the k
cell dissolving the
assignment statement, and the other in the state
cell updating the value of
the assigned variable. Note that the one in the state
is surrounded by
parentheses: (_ => I)
. That is because =>
is greedy: it matches as much
as it can to the left and to the right, until it reaches the cell boundaries
(closed or open). If you want to limit its scope, or for clarity, you can use
parentheses like here.
The rule for sequential composition simply desugars S1 S2
into S1 ~> S2
.
Indeed, the two have exactly the same semantics. Note that statements
evaluate to nothing (.
), so once S1
is processed in S1 ~> S2
, then the
next task is automatically S2
, without wasting any step for the transition.
The rules for the conditional and while statements are clear. One thing to
keep in mind now is that the while
unrolling rule will not apply
indefinitely in the positive branch of the resulting conditional, because
of K's configuration abstraction, which will be discussed shortly.
An IMP program declares a set of variables and then executes a
statement in the state obtained after initializing all those variables
to 0
. The rules for programs initialize the declared variables one by one,
checking also that there are no duplicates. We check for duplicates only for
demonstration purposes, to illustrate the keys
predefined operation that
returns the set of keys of a map, and the set membership operation in
.
In practice, we typically define a static type checker for our language,
which we execute before the semantics and reject inappropriate programs.
The use of the .Ids
in the second rule is not necessary. We could have
written int; S
instead of int .Ids; S
and the K tool would parse it and
kompile the definition correctly, because it uses the same parser used for
parsing programs also to parse the semantics. However, we typically prefer to
explicitly write the nothing values in the semantics, for clarity;
the parser has been extended to accept these. Note that the first rule
matches the entire k
cell, because int_;_
is the top-level program
construct in IMP, so there is nothing following it in the computation cell.
The anonymous variable stands for the second argument of this top-level program
construct, not for the rest of the computation. The second rule could have
also been put in a complete k
cell, but we preferred not to, for simplicity.
Our IMP semantics is now complete, but there are a few more things that we
need to understand and do.
First, let us briefly discuss the very first step of configuration abstraction.
In K, all semantic rules are in fact rules between configurations. As soon
explained in the IMP++ tutorial, the declared configuration cell structure is
used to automatically complete the missing configuration parts in rules.
However, many rules do not involve any cells, being rules between syntactic
terms (of sort K
); for example, we had only three rules involving cells in our
IMP semantics. In this case, the k
cell will be added automatically and the
actual rewrite will happen on top of the enclosed computation. For example,
the rule for the while
loop is automatically translated into the following:
rule <k> while (B) S => if (B) {S while (B) S} else {} ...</k>
Since the first task in computations is what needs to be done next, the
intuition for this rule completion is that the syntactic transition
only happens when the term to rewrite is ready for processing. This explains,
for example, why the while loop unrolling does not indefinitely apply in the
positive branch of the conditional: the inner while loop is not ready for
evaluation yet. We call this rule completion process, as well as other
similar ones, configuration abstraction. That is because the incomplete
rule abstracts away the configuration structure, thus being easier to read.
As seen soon when we define IMP++, configuration abstraction is not only a
user convenience; it actually significantly increases the modularity of our
definitions. The k-cell-completion is only the very first step, though.
If you really want certain rewrites over syntactic terms to apply
anywhere they match, then you should tag the rule with the attribute
anywhere
, which was discussed in Tutorial 1, Lesson 2.5.
Kompile and then krun the programs that you only parsed in Lesson 1. They
should all execute as expected. The state cell shows the final state
of the program. The k
cell shows the final code contents, which should be
empty whenever the IMP program executes correctly.
Kompile also with the documentation option and take a look at the generated
documentation. The assignment rule should particularly be of interest,
because it contains two local rewrites.
In the next lesson we comment the IMP definition and conclude this tutorial.
We here learn no new concepts, but it is a good moment to take a break
and contemplate what we learned so far.
Let us add lots of formal annotations to imp.k
.
Once we are done with the annotations, we kompile with the documentation
option and then take a look at the produced document. We often call these
documents language posters. Depending on how much information you add to
these language posters, they can serve as standalone, formal presentations
of your languages. For example, you can print them as large posters and
post them on the wall, or in poster sessions at conferences.
This completes our second tutorial. The next tutorials will teach us more
features of the K framework, such as how to define languages with complex
control constructs (like callcc
), languages which are concurrent, and so on.
Here you will learn how to define language constructs which abruptly change
the execution control flow, and how to define language semantics following
and environment/store style. Specifically, you will learn the following:
callcc
, which allow you to take snapshots ofHere we add call-with-current-continuation (callcc
) to the definition of
LAMBDA completed in Tutorial 1, and call the resulting language LAMBDA++.
While doing so, we will learn how to define language constructs that
abruptly change the execution control flow.
Take over the lambda.k
definition from Lesson 8 in Part 1 of this Tutorial,
which is the complete definition of the LAMBDA language, but without the
comments.
callcc
is a good example for studying the capabilities of a framework to
support abrupt changes of control, because it is one of the most
control-intensive language constructs known. Scheme is probably the first
programming language that incorporated the callcc
construct, although
similar constructs have been recently included in many other languages in
one form or another.
Here is a quick description: callcc e
passes the remaining computation
context, packaged as a function k
, to e
(which is expected to be a function);
if during its evaluation e
passes any value to k
, then the current
execution context is discarded and replaced by the one encoded by k
and
the value is passed to it; if e
evaluates normally to some value v
and
passes nothing to k
in the process, then v
is returned as a result of
callcc e
and the execution continues normally. For example, we want the
program callcc-jump.lambda
:
(callcc (lambda k . ((k 5) + 2))) + 10
to evaluate to 15
, not 17
! Indeed, the computation context [] + 10
is
passed to callcc
's argument, which then sends it a 5
, so the computation
resumes to 5 + 10
. On the other hand, the program callcc-not-jump.lambda
(callcc (lambda k . (5 + 2))) + 10
evaluates to 17
.
If you like playing games, you can metaphorically think of callcc e
as
saving your game state in a file and passing it to your friend e
.
Then e
can decide at some moment to drop everything she was doing, load
your game and continue to play it from where you were.
The behavior of many popular control-changing constructs can be obtained
using callcc
. The program callcc-return.lambda
shows, for example, how to
obtain the behavior of a return
statement, which exits the current execution
context inside a function and returns a value to the caller's context:
letrec f x = callcc (lambda return . ( f (if (x <= 0) then ((return 1) / 0) else 2) )) in (f -3)
This should evaluate to 1
, in spite of the recursive call to f
and of the division by zero! Note that return
is nothing but a variable
name, but one which is bound to the current continuation at the beginning of
the function execution. As soon as 1
is passed to return
, the computation
jumps back in time to where callcc
was defined! Change -3
to 3
and the
program will loop forever.
callcc
is quite a powerful and beautiful language construct, although one
which is admittedly hard to give semantics to in some frameworks.
But not in K 😃 Here is the entire K syntax and semantics of callcc
:
syntax Exp ::= "callcc" Exp [strict] syntax Val ::= cc(K) rule <k> (callcc V:Val => V cc(K)) ~> K </k> rule <k> cc(K) V ~> _ => V ~> K </k>
Let us first discuss the annotated syntax. We declared callcc
strict,
because its argument may not necessarily be a function yet, so it may need
to be evaluated. As explained above, we need to encode the remaining
computation somehow and pass it to callcc
's argument. More specifically,
since LAMBDA is call-by-value, we have to encode the remaining computation as
a value. We do not want to simply subsort computations to Val
, because there
are computations which we do not want to be values. A simple solution to
achieve our goal here is to introduce a new value construct, say cc
(from
current-continuation), which holds any computation.
Note that, inspired from SDF,
K allows you to define the syntax of helping semantic operations, like cc
,
more compactly. Typically, we do not need a fancy syntax for such operators;
all we need is a name, followed by open parenthesis, followed by a
comma-separated list of arguments, followed by closed parenthesis. If this
is the syntax that you want for a particular construct, then K allows you to
drop all the quotes surrounding the terminals, as we did above for cc
.
The semantic rules do exactly what the English semantics of callcc
says.
Note that here, unlike in our definition of LAMBDA in Tutorial 1, we had
to mention the cell <k/>
in our rules. This is because we need to make sure
that we match the entire remaining computation, not only a fragment of it!
For example, if we replace the two rules above with
rule (callcc V:Val => V cc(K)) ~> K rule cc(K) V ~> _ => V ~> K
then we get a callcc
which is allowed to non-deterministically pick a
prefix of the remaining computation and pass it to its argument, and then
when invoked within its argument, a non-deterministic prefix of the new
computation is discarded and replaced by the saved one. Wow, that would
be quite a language! Would you like to write programs in it? 😃
Consequently, in K we can abruptly change the execution control flow of a
program by simply changing the contents of the <k/>
cell. This is one of
the advantages of having an explicit representation of the execution context,
like in K or in reduction semantics with evaluation contexts. Constructs like
callcc
are very hard and non-elegant to define in frameworks such as SOS,
because those implicitly represent the execution context as proof context,
and the latter cannot be easily changed.
Now that we know how to handle cells in configurations and use them in rules,
in the next lesson we take a fresh look at LAMBDA and define it using
an environment-based style, which avoids the complexity of substitution
(e.g., having to deal with variable capture) and is closer in spirit to how
functional languages are implemented.
Go to Lesson 2, LAMBDA++: Semantic (Non-Syntactic) Computation Items.
In this lesson we start another semantic definition of LAMBDA++, which
follows a style based on environments instead of substitution. In terms of
K, we will learn how easy it is to add new items to the syntactic category
of computations K
, even ones which do not have a syntactic nature.
An environment binds variable names of interest to locations where their
values are stored. The idea of environment-based definitions is to maintain
a global store mapping locations to values, and then have environments
available when we evaluate expressions telling where the variables are
located in the store. Since LAMBDA++ is a relatively simple language, we
only need to maintain one global environment. Following a similar style
like in IMP, we place all cells into a top cell T
:
configuration <T> <k> $PGM:Exp </k> <env> .Map </env> <store> .Map </store> </T>
Recall that $PGM
is where the program is placed by krun
after parsing. So
the program execution starts with an empty environment and an empty store.
In environment-based definitions of lambda-calculi, lambda abstractions
evaluate to so-called closures:
rule <k> lambda X:Id . E => closure(Rho,X,E) ...</k> <env> Rho </env>
A closure is like a lambda abstraction, but it also holds the environment
in which it was declared. This way, when invoked, a closure knows where to
find in the store the values of all the variables that its body expression
refers to. We will define the lookup rule shortly.
Therefore, unlike in the substitution-based definitions of LAMBDA and
LAMBDA++, neither the lambda abstractions nor the identifiers are values
anymore here, because they both evaluate further: lambda abstractions to
closures and identifiers to their values in the store. In fact, the only
values at this moment are the closures, and they are purely semantic entities,
which cannot be used explicitly in programs. That's why we modified the
original syntax of the language to include no Val
syntactic category
anymore, and that's why we need to add closures as values now; same like
before, we add a Val
syntactic category which is subsorted
to KResult
. In general, whenever you have any strictness attributes,
your should also define some K results.
Invoking a closure is a bit more involved than the substitution-based
beta-reduction: we need to switch to the closure's environment, then create a
new, or fresh, binding for the closure's parameter to the value passed to the
closure, then evaluate the closure's body, and then switch back to the
caller's environment, which needs to be stored somewhere in the meanwhile.
We can do all these with one rule:
rule <k> closure(Rho,X,E) V:Val => E ~> Rho' ...</k> <env> Rho' => Rho[X <- !N] </env> <store>... .Map => (!N:Int |-> V) ...</store>
Therefore, we atomically do all the following:
E
, followed by aRho'
(note that Rho'
is the!N
(the !
is important, we discuss it below),X
to !N
in closure's environment and switch the current environmentRho'
to that one,V
, at location !N
.This was the most complex K rule we've seen so far in the tutorial. Note,
however, that this one rule achieves a lot. It is, in fact, quite compact
considering how much it does. Note also that everything that this K rule
mentions is needed also conceptually in order to achieve this task, so it
is minimal from that point of view. That would not be the case if we
used, instead, a conventional rewrite rule, because we would have had to
mention the remaining store, say Sigma
, in both sides of the rule, to say
it stays unchanged. Here we just use ...
.
The declaration of the fresh variable above, !N
, is new and needs
some explanation. First, note that !N
appears only in the right-hand-side
terms in the rule, that is, it is not matched when the rule is applied.
Instead, a fresh Nat
element is generated each time the rule is applied.
In K, we can define syntactic categories which have the capability to
generate fresh elements like above, using unbound variables whose name starts
with a !
. The details of how to do that are beyond the scope of this
tutorial (see Tutorial 6). All we need to know here is that an arbitrary
fresh element of that syntactic category is generated each time the rule
is applied. We cannot rely on the particular name or value of the generated
element, because that can change with the next version of the K tool, or
even from execution to execution with the same version. All you can rely
on is that each newly generated element is distinct from the previously
generated elements for the same syntactic category.
Unlike in the substitution-based definition, we now also need a lookup rule:
rule <k> X => V ...</k> <env>... X |-> N ...</env> <store>... N |-> V ...</store>
This rule speaks for itself: replace X
by the value V
located in the store
at X
's location N
in the current environment.
The only thing left to define is the auxiliary environment-recovery operation:
rule
When the item preceding the environment recovery task Rho
in the
computation becomes a value, replace the current environment with Rho
and dissolve Rho
from the computation.
Let us kompile and ... fail:
kompile lambda
gives a parsing error saying that V:Val
does not fit there in the closure
invocation rule. That's because Val
and Exp
are currently completely
disconnected, so K rightfully complains that we want to apply a value to
another one, because application was defined to work with expressions, not
values. What we forgot here was to state that Exp
includes Val
:
syntax Exp ::= Val
Now everything works, but it is a good time to reflect a bit.
So we added closures, which are inherently semantic entities, to the syntax
of expressions. Does that mean that we can now write LAMBDA programs with
closures in them? Interestingly, with our current definition of LAMBDA,
which purposely did not follow the nice organization of IMP into syntax and
semantic modules, and with K's default parser, kast
, you can. But you are
not supposed to speculate this! In fact, if you use an external parser, that
parser will reject programs with explicit closures. Also, if we split the
LAMBDA definition into two modules, one called LAMBDA-SYNTAX containing
exclusively the desired program syntax and one called LAMBDA importing the
former and defining the syntax of the auxiliary operations and the semantics,
then even K's default parser will reject programs using auxiliary syntactic
constructs.
Indeed, when you kompile a language, say lang.k
, the tool will by default
attempt to find a module LANG-SYNTAX and generate the program parser from
that. If it cannot find it, then it will use the module LANG instead. There
are also ways to tell kompile precisely which syntax module you want to use
for the program parser if you don't like the default convention.
See kompile --help
.
Another insightful thought to reflect upon, is the relationship between your
language's values and other syntactic categories. It is often the case that
values form a subset of the original language syntax, like in IMP (Part 2 of
the tutorial), but sometimes that is not true, like in our case here. When
that happens, in order for the semantics to be given smoothly and uniformly
using the original syntax, you need to extend your language's original
syntactic categories with the new values. The same holds true in other
semantic approaches, not only in K, even in ones which are considered purely
syntactic. As it should be clear by now, K does not enforce you to use a
purely syntactic style in your definitions; nevertheless, K does allow you to
develop purely syntactic definitions, like LAMBDA in Part 1 of the tutorial,
if you prefer those.
krun
some programs, such as those provided in Lesson 1 of the LAMBDA
tutorial (Part 1). Note the closures, both as results in the <k/>
cell,
and as values in the store. Also, since variables are not values anymore,
expressions that contain free variables may get stuck with one of those on
top of their computation. See, for example, free-variable-capture.lambda
,
which gets stuck on z
, because z
is free, so it cannot evaluate it.
If you want, you can go ahead and manually provide a configuration with
z
mapped to some location in the environment and that location mapped to
some value in the store, and then you can also execute this program. The
program omega.lambda
should still loop.
Although we completely changed the definitional style of LAMBDA, the semantics
of the other constructs do not need to change, as seen in the next lesson.
In this lesson we will learn that, in some cases, we can reuse existing
semantics of language features without having to make any change!
Although the definitional style of the basic LAMBDA language changed quite
radically in our previous lesson, compared to its original definition in
Part 1 of the tutorial, we fortunately can reuse a large portion of the
previous definition. For example, let us just cut-and-paste the rest of the
definition from Lesson 7 in Part 1 of the tutorial.
Let us kompile
and krun
all the remaining programs from Part 1 of the
tutorial. Everything should work fine, although the store contains lots of
garbage. Garbage collection is an interesting topic, but we do not do it
here. Nevertheless, much of this garbage is caused by the intricate use of
the fixed-point combinator to define recursion. In a future lesson in this
tutorial we will see that a different, environment-based definition of
fixed-points will allocate much less memory.
One interesting question at this stage is: how do we know when we can reuse
an existing semantics of a language feature? Well, I'm afraid the answer is:
we don't. In the next lesson we will learn how reuse can fail for quite subtle
reasons, which are impossible to detect statically (and some non-experts may
fail to even detect them at all).
It may be tempting to base your decision to reuse an existing semantics of
a language feature solely on syntactic considerations; for example, to reuse
whenever the parser does not complain. As seen in this lesson, this could
be quite risky.
Let's try (and fail) to reuse the definition of callcc
from Lesson 1:
syntax Exp ::= "callcc" Exp [strict] syntax Val ::= cc(K) rule <k> (callcc V:Val => V cc(K)) ~> K </k> rule <k> cc(K) V ~> _ => V ~> K </k>
The callcc
examples that we tried in Lesson 1 work, so it may look it works.
However, the problem is that cc(K)
should also include an environment,
and that environment should also be restored when cc(K)
is invoked.
Let's try to illustrate this bug with callcc-env1.lambda
let x = 1 in ((callcc lambda k . (let x = 2 in (k x))) + x)
where the second argument of +
, x
, should be bound to the top x
, which
is 1. However, since callcc
does not restore the environment, that x
should be looked up in the wrong, callcc-inner environment, so we should see
the overall result 4.
Hm, we get the right result, 3 ... (Note: you may get 4, depending on
your version of K and platform; but both 3 and 4 are possible results, as
explained below and seen in the tests). How can we get 3? Well, recall that
+
is strict, which means that it can evaluate its arguments in any order.
It just happened that in the execution that took place above its second
argument was evaluated first, to 1, and then the callcc
was evaluated, but
its cc
value K had already included the 1 instead of x
... In Part 4 of
the tutorial we will see how to explore all the non-deterministic behaviors of
a program; we could use that feature of K to debug semantics, too.
For example, in this case, we could search for all behaviors of this program
and we would indeed get two possible value results: 3 and 4.
One may think that the problem is the non-deterministic evaluation order
of +
, and thus that all we need to do is to enforce a deterministic order
in which the arguments of + are evaluated. Let us follow this path to
see what happens. There are two simple ways to make the evaluation order
of +
's arguments deterministic. One is to make +
seqstrict
in the
semantics, to enforce its evaluation from left-to-right. Do it and then
run the program above again; you should get only one behavior for the
program above, 4, which therefore shows that copying-and-pasting our old
definition of callcc
was incorrect. However, as seen shortly, that only
fixed the problem for the particular example above, but not in general.
Another conventional approach to enforce the desired evaluation order is to
modify the program to enforce the left-to-right evaluation order using let
binders, as we do in callcc-env2.lambda
:
let x = 1 in let a = callcc lambda k . (let x = 2 in (k x)) in let b = x in (a + b)
With your installation of K you may get the "expected" result 4 when you
execute this program, so it may look like our non-deterministic problem is
fixed. Unfortunately, it is not. Using the K tool to search for all the
behaviors in the program above reveals that the final result 3 is still
possible. Moreover, both the 3 and the 4 behaviors are possible regardless
of whether +
is declared to be seqstrict
or just strict
. How is that
possible? The problem is now the non-deterministic evaluation strategy of
the function application construct. Indeed, recall that the semantics of
the let-in construct is defined by desugaring to lambda application:
rule let X = E in E' => (lambda X . E') E
With this, the program above eventually reduces to
(lambda a . ((lambda b . a + b) x)) (callcc lambda k . (let x = 2 in (k x)))
in an environment where x
is 1. If the first expression evaluates first,
then it does so to a closure in which x
is bound to a location holding 1,
so when applied later on to the x
inside the argument of callcc
(which is
2), it will correctly lookup x
in its enclosed environment and thus the
program will evaluate to 3. On the other hand, if the second expression
evaluates first, then the cc
value will freeze the first expression as is,
breaking the relationship between its x
and the current environment in which
it is bound to 1, being inadvertently captured by the environment of the
let-in construct inside the callcc
and thus making the entire expression
evaluate to 4.
So the morale is: Do not reuse blindly. Think!
In the next lesson we fix the environment-based semantics of callcc
by having
cc
also wrap an environment, besides a computation. We will also give a more
direct semantics to recursion, based on environments instead of fixed-point
combinators.
In this lesson we see more examples of semantic (i.e., non-syntactic)
computational items, and how useful they can be. Specifically, we fix the
environment-based definition of callcc
and give an environment-based
definition of the mu
construct for recursion.
Let us first fix callcc
. As discussed in Lesson 4, the problem that we
noticed there was that we only recovered the computation, but not the
environment, when a value was passed to the current continuation. This is
quite easy to fix: we modify cc
to take both an environment and a
computation, and its rules to take a snapshot of the current environment with
it, and to recover it at invocation time:
syntax Val ::= cc(Map,K) rule <k> (callcc V:Val => V cc(Rho,K)) ~> K </k> <env> Rho </env> rule <k> cc(Rho,K) V:Val ~> _ => V ~> K </k> <env> _ => Rho </env>
Let us kompile and make sure it works with the callcc-env2.lambda
program,
which should evaluate to 3, not to 4.
Note that the cc
value, which can be used as a computation item in the <k/>
cell, is now quite semantic in nature, pretty much the same as the closures.
Let us next add one more closure-like semantic computational item, for mu
.
But before that, let us reuse the semantics of letrec
in terms of mu
that
was defined in Lesson 8 of Part 1 of the tutorial on LAMBDA:
syntax Exp ::= "letrec" Id Id "=" Exp "in" Exp [macro] | "mu" Id "." Exp [latex(\mu{#1}.{#2})] rule letrec F:Id X = E in E' => let F = mu F . lambda X . E in E'
We removed the binder
annotation of mu
, because it is not necessary
anymore (since we do not work with substitutions anymore).
To save the number of locations needed to evaluate mu X . E
, let us replace
it with a special closure which already binds X
to a fresh location holding
the closure itself:
syntax Exp ::= muclosure(Map,Exp) rule <k> mu X . E => muclosure(Rho[X <- !N], E) ...</k> <env> Rho </env> <store>... .Map => (!N:Int |-> muclosure(Rho[X <- !N], E)) ...</store>
Since each time mu X . E
is encountered during the evaluation it needs to
evaluate E
, we conclude that muclosure
cannot be a value. We can declare
it as either an expression or as a computation. Let's go with the former.
Finally, here is the rule unrolling the muclosure
:
rule
Note that the current environment Rho'
needs to be saved before and
restored after E
is executed, because the fixed point may be invoked
from a context with a completely different environment from the one
in which mu X . E
was declared.
We are done. Let us now kompile
and krun
factorial-letrec.lambda
from
Lesson 7 in Part 1 of the tutorial on LAMBDA. Recall that in the previous
lesson this program generated a lot of garbage into the store, due to the
need to allocate space for the arguments of all those lambda abstractions
needed to run the fixed-point combinator. Now we need much fewer locations,
essentially only locations for the argument of the factorial function, one at
each recursive call. Anyway, much better than before.
In the next lesson we wrap up the environment definition of LAMBDA++ and
generate its documentation.
Go to Lesson 6, LAMBDA++: Wrapping Up and Documenting LAMBDA++.
In this lesson we wrap up and nicely document LAMBDA++. In doing so, we also
take the freedom to reorganize the semantics a bit, to make it look better.
See the lambda.k
file, which is self-explanatory.
Part 3 of the tutorial is now complete. Part 4 will teach you more features
of the K framework, in particular how to exhaustively explore the behaviors
of non-deterministic or concurrent programs.
IMP++ extends IMP, which was discussed in Part 2 of this tutorial, with several
new syntactic constructs. Also, some existing syntax is generalized, which
requires non-modular changes of the existing IMP semantics. For example,
global variable declarations become local declarations and can occur
anywhere a statement can occur. In this tutorial we will learn the following:
superheat
/supercool
options of kompile
.search
option of krun
works.Like in the previous tutorials, this folder contains several lessons, each
adding new features to IMP++. Do them in order and make sure you completed
and understood the previous tutorials.
Here we learn how to extend the syntax of an existing language, both with
new syntactic constructs and with more general uses of existing constructs.
The latter, in particular, requires changes of the existing semantics.
Consider the IMP language, as defined in Lesson 4 of Part 2 of the tutorial.
Let us first add the new syntactic constructs, with their precedences:
++
, which increments an integer variable andread
, which reads and evaluates to a new integer from the input buffer;print
, which takes a comma-separated list of arithmetic expressions andAExps
,print
; note we do not want to declareprint
to be strict
, because we do not want to first evaluate thehalt
, which abruptly terminates the program; andspawn
, which takes a statement and creates a new concurrent threadAlso, we want to allow local variable declarations, which can appear anywhere
a statement can appear. Their scope ranges from the place they are defined
until the end of the current block, and they can shadow previous declarations,
both inside and outside the current block. The simplest way to define the
syntax of the new variable declarations is as ordinary statements, at the same
time removing the previous Pgm
syntactic category and its construct.
Programs are now just statements.
We are now done with adding the new syntax and modifying the old one.
Note that the old syntax was modified in a way which makes the previous IMP
programs still parse, but this time as statements. Let us then modify
the configuration variable $PGM
to have the sort Stmt
instead of Pgm
,
and let us try to run the old IMP programs, for example sum.imp
.
Note that they actually get stuck with the global declaration on the top
of their computations. This is because variable declarations are now treated
like any statements, in particular, the sequential composition rule applies.
This makes the old IMP rule for global variable declarations not match anymore.
We can easily fix it by replacing the anonymous variable _
, which matched
the program's statement that now turned into the remaining computation in
the <k/>
cell, with the cell frame variable ...
, which matches the
remaining computation. Similarly, we have to change the rule for the case
where there are no variables left to declare into one that dissolves itself.
We can now run all the previous IMP programs, in spite of the fact that
our IMP++ semantics is incomplete and, more interestingly, in spite of the
fact that our current semantics of blocks is incorrect in what regards the
semantics of local variable declarations (note that the old IMP programs do
not declare block-local variables, which is why they still run correctly).
Let us also write some proper IMP++ programs, which we would like to execute
once we give semantics to the new constructs.
div.imp
is a program manifesting non-deterministic behaviors due to the
desired non-deterministic evaluation strategy of division and the fact that
expressions will have side effects once we add variable increment. We will
be able to see all the different behaviors of this program. Challenge: can
you identify the behavior where the program performs a division-by-zero?
If we run div.imp
now, it will get stuck with the variable increment
construct on top of the computation cell. Once we give it a semantics,
div.imp
will execute completely (all the other constructs in div.imp
already have their semantics defined as part of IMP).
Note that some people prefer to define all their semantics in a by need
style, that is, they first write and parse lots of programs, and then they
add semantics to each language construct on which any of the programs gets
stuck, and so on and so forth until they can run all the programs.
io.imp
is a program which exercises the input/output capabilities of the
language: reads two integers and prints three strings and an integer.
Note that the variable declaration is not the first statement anymore.
sum-io.imp
is an interactive variant of the sum program.
spawn.imp
is a program which dynamically creates two threads that interact
with the main thread via the shared variable x. Lots of behaviors will be
seen here once we give spawn the right semantics.
Finally, locals.imp
tests whether variable shadowing/unshadowing works well.
In the next lesson we will prepare the configuration for the new constructs,
and will see what it takes to adapt the semantics to the new configuration.
Specifically, we will split the state cell into an environment cell and a
store cell, like in LAMBDA++ in Part 3 of the tutorial.
To prepare for the semantics of threads and local variables, in this lesson we
split the state cell into an environment and a store. The environment and
the store will be similar to those in the definition of LAMBDA++ in Part
3 of the Tutorial. This configuration refinement will require us to change
some of IMP's rules, namely those that used the state.
To split the state map, which binds program variables to values, into an
environment mapping program variables to locations and a store mapping
locations to values, we replace in the configuration declaration the cell
<state color="red"> .Map </state>
with two cells
<env color="LightSkyBlue"> .Map </env> <store color="red"> .Map </store>
Structurally speaking, this split of a cell into other cells is a major
semantic change, which, unfortunately, requires us to revisit the existing
rules that used the state cell. One could, of course, argue that we could
have avoided this problem if we had followed from the very beginning the
good-practice style to work with an environment and a store, instead of a
monolithic state. While that is a valid argument, highlighting the fact that
modularity is not only a feature of the framework alone, but one should also
follow good practices to achieve it, it is also true that if all we wanted
in Part 2 of the tutorial was to define IMP as is, then the split of the state
in an environment and a store is unnecessary and not really justified.
The first rule which used a state cell is the lookup rule:
rule <k> X:Id => I ...</k> <state>... X |-> I ...</state>
We modify it as follows:
rule <k> X:Id => I ...</k> <env>... X |-> N ...</env> <store>... N |-> I ...</store>
So we first match the location N
of X
in the environment, then the value
I
at location N
in the store, and finally we rewrite X
to I
into the
computation. This rule also shows an instance of a more complex
multiset matching, where two variables (X
and N
) are matched each twice.
The assignment rule is modified quite similarly.
The variable declaration rule is trickier, though, because we need to allocate
a fresh location in the store and bind the newly declared variable to it.
This is quite similar to the way we allocated space for variables in
the environment-based definition of LAMBDA++ in Part 3 of the tutorial.
rule <k> int (X,Xs => Xs); ...</k> <env> Rho => Rho[X <- !N:Int] </env> <store>... .Map => !N |-> 0 ...</store>
Note the use of the fresh (!N
) variable notation above. Recall from
the LAMBDA++ tutorial that each time the rule with fresh (!
) variables is
applied, fresh elements of corresponding sorts are generated for the fresh
variables, distinct from all the previously generated elements; also, we
cannot and should not assume anything about the particular element that is
being generated, except that it is different from the previous ones.
kompile
and krun
sum.imp
to see how the fresh locations have been
generated and used. There were two fresh locations needed, for the two
variables. Note also that a cell holding the counter has been added to the
configuration.
In the next lesson we will add the semantics of variable increment, and see
how that yields non-deterministic behaviors in programs and how to explore
those behaviors using the K tool.
Go to Lesson 3, IMP++: Tagging; Superheat/Supercool Kompilation Options.
In this lesson we add the semantics of variable increment. We also learn
how to instruct the kompile
tool to instrument the language model for
exhaustive analysis.
The variable increment rule is self-explanatory:
rule <k> ++X => I +Int 1 ...</k> <env>... X |-> N ...</env> <store>... N |-> (I => I +Int 1) ...</store>
We can now run programs like our div.imp
program introduced in Lesson 1.
Do it.
The addition of increment makes the evaluation of expressions have side
effects. That, in combination with the non-determinism allowed by the
strictness attributes in how expression constructs evaluate their
arguments, makes expressions in particular and programs in general have
non-deterministic behaviors. One possible execution of the div.imp
program
assigns 1
to y
's location, for example, but this program manifests several
other behaviors, too.
To see all the (final-state) behaviors that a program can have, you can kompile
the semantics with --enable-search
and call the krun
tool with the option
--search
. For example:
krun div.imp --search
In the next lesson we add input/output to our language and learn how to
generate a model of it which behaves like an interactive interpreter!
Go to Lesson 4, IMP++: Semantic Lists; Input/Output Streaming.
In this lesson we add semantics to the read
and print
IMP++ constructs.
In doing so, we also learn how to use semantic lists and how to connect
cells holding semantic lists to the standard input and standard output.
This allows us to turn the K semantics into an interactive interpreter.
We start by adding two new cells to the configuration,
<in color="magenta"> .List </in> <out color="Orchid"> .List </out>
each holding a semantic list, initially empty. Semantic lists are
space-separated sequences of items, each item being a term of the form
ListItem(t)
, where t
is a term of sort K
. Recall that the semantic maps,
which we use for states, environments, stores, etc., are sets of pairs
t1 |-> t2
, where t1
and t2
are terms of sort K. The ListItem
wrapper
is currently needed, to avoid parsing ambiguities.
Since we want the print
statement to also print strings, we need to tell
K that strings are results. To make it more interesting, let us also overload
the +
symbol on arithmetic expressions to also take strings and, as a
result, to concatenate them. Since +
is already strict, we only need to add
a rule reducing the IMP addition of strings to the builtin operation +String
which concatenates two strings.
The semantics of read
is immediate: read and consumes the first integer item
from the <in/>
cell; note that our read only reads integer values (it gets
stuck if the first item in the <in/>
cell is not an integer).
The semantics of print
is a bit trickier. Recall that print
takes an
arbitrary number of arithmetic expression arguments, and evaluates and outputs
each of them in order, from left to right. For example,
print("Hello", 3/0, "Bye");
outputs "Hello" and then gets stuck on the
illegal division by zero operation. In other words, we do not want it to
first evaluate all its arguments and then print them, because that would miss
outputting potentially valuable information. So the first step is to evaluate
the first argument of print
. In some sense, what we'd like to say is that
print
has the evaluation strategy strict(1)
. However, strictness
attributes only work with individual language constructs, while what we need
is an evaluation strategy that involves two constructs: print
and the list
(comma) construct of AExps
. If we naively associate print
the strict(1)
evaluation strategy then its first and unique argument, an AExps
list, will
be scheduled for evaluation and the execution will get stuck because we have
no rules for evaluating AExps
terms. If we make the list construct of
AExps
strict
then we get the wrong semantics for print
which first
evaluates all its arguments and then outputs them. The correct way to
tell K that print
should evaluate only its first argument is by using a
context declaration:
context print(HOLE:AExp, _);
Note the HOLE
of sort AExp
above. Contexts allow us to define finer-grain
evaluation strategies than the strictness attributes, involving potentially
more than one language construct, like above. The HOLE
indicates the
argument which is requested to be evaluated. For example, the strict
attribute of division corresponds to two contexts:
context HOLE / _ context _ / HOLE
In their full generality, contexts can be any terms with precisely one
occurrence of a HOLE
, and with arbitrary side conditions on any variables
occurring in the context term as well as on the HOLE
. See Part 6 of the
tutorial for more examples.
Once evaluated, the first argument of print
is expected to become either an
integer or a string. Since we want to print both integers and string values,
to avoid writing two rules, one for each type of value, we instead add a new
syntactic category, Printable
, which is the union of integers and strings.
Let us kompile
and krun
the io.imp
program discussed in Lesson 1. As
expected, it gets stuck with a read construct on top of the computation and
with an empty <in/>
cell. To run it, we need to provide some items in the
<in/>
cell, so that the rule of read can match. Let us add
<in> ListItem(3) ListItem(5) ListItem(7) </in>
Now, if we krun
io.imp
, we can see that its execution completes normally
(the <k/>
cell is empty), that the first two items have been removed by the
two read constructs from the <in/>
cell, and that the desired strings and
numbers have been placed into the <out/>
cell.
Cells holding semantic lists can be connected to the standard input and
standard output buffers, and krun
knows how to handle these appropriately.
Let us connect the <in/>
cell to the standard input using the cell attribute
stream="stdin"
and the <out/>
cell to the standard output with the
attribute stream="sdtout"
. A cell connected to the standard input will
take its items from the standard input and block the rewriting process when
an input is needed until an item is available in the standard input buffer.
A cell connected to the standard output buffer will send all its items, in
order, to the standard output.
Let us kompile
and krun
io.imp
again. It prints the message and then
waits for your input numbers. Type in two numbers, then press <Enter>
.
A message with their sum is then printed, followed by the final configuration.
If you do not want to see the final configuration, and thus obtain a realistic
interpreter for our language, then call krun
with the option --output none
:
krun io.imp --output none
Let us now krun
our interactive sum program, which continuously reads numbers
from the console and prints the sum of numbers up to them:
krun sum-io.imp
Try a few numbers, then 0
. Note that the program terminated, but with junk
in the <k/>
cell, essentially with a halt
statement on its top. Of course,
because halt
has been reached and it has no semantics yet.
In the next lesson we give the semantics of halt
and also fix the semantics
of blocks with local variable declarations.
Go to Lesson 5, IMP++: Deleting, Saving and Restoring Cell Contents.
In this lesson we will see how easily we can delete, save and/or restore
contents of cells in order to achieve the desired semantics of language
constructs that involve abrupt changes of control or environments. We have
seen similar or related K features in the LAMBDA++ language in Part 3 of the
tutorial.
Let us start by adding semantics to the halt
statement. As its name says,
what we want is to abruptly terminate the execution of the program. Moreover,
we want the program configuration to look as if the program terminated
normally, with an empty computation cell. The simplest way to achieve that is
to simply empty the computation cell when halt
is encountered:
rule <k> halt; ~> _ => . </k>
It is important to mention the entire <k/>
cell here, with both its membranes
closed, to make sure that its entire contents is discarded. Note the
anonymous variable, which matches the rest of the computation.
kompile
and krun
sum-io.imp
. Note that unlike in Lesson 4, the program
terminates with an empty computation cell now.
As mentioned earlier, the semantics of blocks that was inherited from IMP is
wrong. Program locals.imp
shows it very clearly: the environments are not
correctly restored at block exits. One way to fix the problem is to take
a snapshot of the current environment when a block is entered and save it
somewhere, and then to restore it when the block is left. There are many
ways to do this, which you can explore on your own: for example you can add
a new list cell for this task where to push/pop the environment snapshots in
a stack style; or you can use the existing environment cell for this purpose,
but then you need to change the variable access rules to search through the
stacked environments for the variable.
My preferred solution is to follow a style similar to how we saved/restored
LAMBDA++ environments in Part 3 of the Tutorial, namely to use the already
existing <k/>
cell for such operations. More specifically, we place a
reminder item in the computation whenever we need to take a snapshot of
some cell contents; the item simply consists of the entire contents of the cell.
Then, when the reminder item is reached, we restore the contents of the cell:
rule <k> {S} => S ~> Rho ...</k> <env> Rho </env>
The only thing left now is to give the definition of environment restore:
rule <k> Rho => . ...</k> <env> _ => Rho </env>
Done. kompile
and krun
locals.imp
. Everything should work correctly now.
Note that the rule above is different from the one we had for LAMBDA++ in
Part 3 of the tutorial, in that here there is no value preceding the environment
restoration item in the computation; that's because IMP++ statements,
unlike LAMBDA++'s expressions, evaluate to nothing (.
).
In the next lesson we will give semantics to the spawn S
construct, which
dynamically creates a concurrent shared-memory thread executing statement S
.
Go to Lesson 6, IMP++: Adding/Deleting Cells Dynamically; Configuration Abstraction, Part 2.
In this lesson we add dynamic thread creation and termination to IMP, and
while doing so we learn how to define and use configurations whose structure
can evolve dynamically.
Recall that the intended semantics of spawn S
is to spawn a new concurrent
thread that executes S
. The new thread is being passed at creation time
its parent's environment, so it can share with its parent the memory
locations that its parent had access to at creation time. No other locations
can be shared, and no other memory sharing mechanism is available.
The parent and the child threads can evolve unrestricted, in particular they
can change their environments by declaring new variables or shadowing existing
ones, can create other threads, and so on.
The above suggests that each thread should have its own computation and its
own environment. This can be elegantly achieved if we group the <k/>
and
<env/>
cells in a <thread/>
cell in the configuration. Since at any given
moment during the execution of a program there could be zero, one or more
instances of such a <thread/>
cell in the configuration, it is a good idea
to declare the <thread/>
cell with multiplicity *
(i.e., zero, one or more):
<thread multiplicity="*" color="blue"> <k color="green"> $PGM:Stmt </k> <env color="LightSkyBlue"> .Map </env> </thread>
This multiplicity declaration is not necessary, but it is a good idea to do
it for several reasons:
For good encapsulation, I also prefer to put all thread cells into one cell,
<threads/>
. This is technically unnecessary, though; to convince yourself
that this is indeed the case, you can remove this cell once we are done with
the semantics and everything will work without having to make any changes.
Before we continue, let us kompile
an krun
some programs that used to
work, say sum-io.imp
. In spite of the relatively radical configuration
reorganization, those programs execute just fine! How is that possible?
In particular, why do rules like the lookup and assignment still work,
unchanged, in spite of the fact that the <k/>
and <env/>
cells are not at
the same level with the <store/>
cell in the configuration anymore?
Welcome to configuration abstraction, part 2. Recall that the role of
configuration abstraction is to allow you to only write the relevant
information in each rule, and have the compiler fill-in the obvious and boring
details. According to the configuration that we declared for our new
language, there is only one reasonable way to complete rules like the lookup,
namely to place the <k/>
and </env>
cells inside a <thread/>
cell,
inside a <threads/>
cell:
rule <threads>... <thread>... <k> X:Id => I ...</k> <env>... X |-> N ...</env> ...</thread> ...<threads/> <store>... N |-> I ...</store> [lookup]
This is the most direct, compact and local way to complete the configuration
context of the lookup rule. If for some reason you wanted here to match the
<k/>
cell of one thread and the <env/>
cell of another thread, then you
would need to explicitly tell K so, by mentioning the two thread cells,
for example:
rule <thread>... <k> X:Id => I ...</k> ...</thread> <thread>... <env>... X |-> N ...</env> ...</thread> <store>... N |-> I ...</store> [lookup]
By default, K completes rules in a greedy style. Think this way: what is the
minimal number of changes to my rule to make it fit the declared
configuration? That's what the K tool will do.
Configuration abstraction is technically unnecessary, but once you start
using it and get a feel for how it works, it will become your best friend.
It allows you to focus on the essentials of your semantics, and at the same
time gives you flexibility in changing the configuration later on without
having to touch the rules. For example, it allows you to remove the
<threads/>
cell from the configuration, if you don't like it, without
having to touch any rule.
We are now ready to give the semantics of spawn
:
rule <k> spawn S => . ...</k> <env> Rho </env> (. => <thread>... <k> S </k> <env> Rho </env> ...</thread>)
Note configuration abstraction at work, again. Taking into account
the declared configuration, and in particular the multiplicity information
*
in the <thread/>
cell, the only reasonable way to complete the rule
above is to wrap the <k/>
and <env/>
cells on the first line within a
<thread/>
cell, and to fill-in the ...
s in the child thread with the
default contents of the other subcells in <thread/>
. In this case there
are no other cells, so we can get rid of those ...
s, but that would
decrease the modularity of this rule: indeed, we may later on add other
cells within <thread/>
as the language evolves, for example a function
or an exception stack, etc.
In theory, we should be able to write the rule above even more compactly
and modularly, namely as
rule <k> spawn S => . ...</k> <env> Rho </env> (. => <k> S </k> <env> Rho </env>)
Unfortunately, this currently does not work in the K tool, due to some
known limitations of our current configuration abstraction algorithm.
This latter rule would be more modular, because it would not even depend
on the cell name thread
. For example, we may later decide to change
thread
into agent
, and we would not have to touch this rule.
We hope this current limitation will be eliminated soon.
Once a thread terminates, its computation cell becomes empty. When that
happens, we can go ahead and remove the useless thread
cell:
rule <thread>... <k> . </k> ...</thread> => .
Let's see what we've got. kompile
and krun
spawn.imp
.
Note the following:
<threads/>
cell is empty, so all threads terminated normally;Therefore, interesting behaviors may happen; we would like to see them all!
krun spawn.imp --search
However, the above does not work.
spawn.imp
is an interactive program, which reads a number from the
standard input. When analyzing programs exhaustively using the search option,
krun
has to disable the streaming capabilities (just think about it and you
will realize why). The best you can do in terms of interactivity with search
is to pipe some input to krun
: krun
will flush the standard input buffer
into the cells connected to it when creating the initial configuration (will
do that no matter whether you run it with or without the --search
option).
For example:
echo 23 | krun spawn.imp --search
puts 23
in the standard input buffer, which is then transferred in the
<in/>
cell as a list item, and then the exhaustive search procedure is
invoked.
However, even after piping some input, the spawn.imp
program outputs
an error:
[Error] krun: You must pass --enable-search to kompile to be able to use krun --search with the LLVM backend
As explained in Lesson 3, by default kompile
optimizes the generated
language model for execution. In particular, it does not insert any
backtracking markers where transition attempts should be made, so krun
lacks the information it needs to exhaustively search the generated language
model.
kompile
with the search feature enabled:
kompile imp --enable-search
Now echo 23 | krun spawn.imp --search
gives us all 12 behaviors of the
spawn.imp
program.
We currently have no mechanism for thread synchronization. In the next lesson
we add a join
statement, which allows a thread to wait until another completes.
Go to Lesson 7, IMP++: Everything Changes: Syntax, Configuration, Semantics.
In this lesson we add thread joining, one of the simplest thread
synchronization mechanisms. In doing so, we need to add unique ids
to threads in the configuration, and to modify the syntax to allow spawn
to return the id of the newly created thread. This gives us an opportunity
to make several other small syntactic and semantics changes to the language,
which make it more powerful or more compact at a rather low cost.
Before we start, let us first copy and modify the previous spawn.imp
program
from Lesson 1 to make use of thread joining. Recall from Lesson 6 that in some
runs of this program the main thread completed before the child threads,
printing a possibly undesired value of x
. What we want now is to assign
unique ids to the two spawned threads, and then to modify the main thread to
join the two child threads before printing. To avoid adding a new type to
the language, let's assume that thread ids are integer numbers. So we declare
two integers, t1
and t2
, and assign them the two spawn commands. In order
for this to parse, we will have to change the syntax of spawn
to be an
arithmetic expression construct instead of a statement. Once we do that,
we have a slight syntactic annoyance: we need to put two consecutive ;
after the spawn assignment, one for the assignment statement inside the spawn,
and another for the outer assignment. To avoid the two consecutive semicolons,
we can syntactically enforce spawn to take a block as argument, instead of a
statement. Now it looks better. The new spawn.imp
program is still
non-deterministic, because the two threads can execute in any order and even
continue to have a data-race on the shared variable x
, but we should see fewer
behaviors when we use the join
statements. If we want to fully synchronize
this program, we can have the second thread start with a join(t1)
statement.
Then we should only see one behavior for this program.
Let us now modify the language semantics. First, we move the spawn
construct from statements to expressions, and make it take a block.
Second, we add one more sub-cell to the thread cell in the configuration,
<id/>
, to hold the unique identifier of the thread. We want the main
thread to have id 0
, so we initialize this cell with 0
. Third, we modify
the spawn rule to generate a fresh integer identifier, which is put in the
<id/>
cell of the child thread and returned as a result of spawn
in the
parent thread. Fourth, let us add the join
statement to the language,
both syntactically and semantically. So in order for the join(T)
statement
to execute, thread T
must have its computation empty. However, in order
for this to work we have to get rid of the thread termination cleanup rule.
Indeed, we need to store somewhere the information that thread T
terminated;
the simplest way to do it is to not remove the terminated threads. Feel free
to experiment with other possibilities, too, here. For example, you may add
another cell, <done/>
, in which you can store all the thread ids of the
terminated and garbage-collected threads.
Let us now kompile imp.k
and convince ourselves that the new spawn.imp
with join
statements indeed has fewer behaviors than its variant without
join
statements. Also, let us convince ourselves that the fully synchronized
variant of it indeed has only one behavior.
Note that now spawn, like variable increment, makes the evaluation of
expressions to have side effects. Many programming languages in fact allow
expressions to be evaluated only for their side effects, and not for their
value. This is typically done by simply adding a ;
after the expression
and thus turning it into a statement. For example, ++x;
. Let as also
allow arithmetic expressions in our language to be used as statements, by
simply adding the production AExp ";"
to Stmt
, with evaluation strategy
strict
and with the expected semantics discarding the value of the AExp
.
Another simple change in syntax and semantics which gives our language more
power, is to remove the ;
from the syntax of variable assignments and to make
them expression instead of statement constructs. This change, combined with
the previous one, will still allow us to parse all the programs that we could
parse before, but will also allow us to parse more programs. For example, we
can now do sequence assignments like in C: x = y = z = 0
. The semantics
of assignment now has to return the assigned value also to the computation,
because we want the assignment expression to evaluate to the assigned value.
Let us also make another change, but this time one which only makes the
definition more compact. Instead of defining statement sequential
composition as a binary construct for statements, let us define a new
syntactic construct, Stmts
, as whitespace-separated lists of Stmt
. This
allows us to get rid of the empty blocks, because we can change the syntax of
blocks to {Stmts}
and Stmts
also allows the empty sequence of statements.
However, we do have to make sure that .Stmts
dissolves.
In general, unless you are defining a well-established programming language,
it is quite likely that your definitions will suffer lots of changes like the
ones seen in this lecture. You add a new construct, which suggests changes
in the existing syntax making in fact your language parse more programs,
which then requires corresponding changes in the semantics, and so on.
Also, compact definitions are desirable in general, because they are easier
to read and easier to change if needed later.
In the next lesson we wrap up and document the definition of IMP++.
In this lesson we wrap up IMP++'s semantics and also generate its poster.
While doing so, we also learn how to display larger configurations in order
to make them easier to read and print.
Note that we rearrange a bit the semantics, to group the semantics of old
IMP's constructs together, and separate it from the new IMP++'s semantics.
You can go even further and manually edit the generated Latex document.
You typically want to do that when you want to publish your language
definition, or parts of it, and you need to finely tune it to fit the
editing requirements. For example, you may want to insert some negative
spaces, etc.
Part 4 of the tutorial is now complete. At this moment you should know most
of K framework's features and how to use the K tool. You can now define or
design your own programming languages, and then execute and analyze programs.
In this part of the tutorial we will show that defining type systems for
languages is essentially no different from defining semantics. The major
difference is that programs and fragments of programs now rewrite to their
types, instead of to concrete values. In terms of K, we will learn how
to use it for a certain particular but important kind of applications.
In this lesson you learn how to define a type system for an imperative
language (the IMP++ language defined in Part 4 of the tutorial), using a style
based on type environments.
Let us copy the imp.k
file from Part 4 of the tutorial, Lesson 7, which holds
the semantics of IMP++, and modify it into a type system. The resulting type
system, when executed, yields a type checker.
We start by defining the new strictness attributes of the IMP++ syntax.
While doing so, remember that programs and fragments of programs now reduce
to their types. So types will be the new results of our new (type) semantics.
We also clean up the semantics by removing the unnecessary tags, and also
use strict
instead of seqstrict
wherever possible, because strict
gives
implementations more freedom. Interestingly, note that spawn
is strict now,
because the code of the child thread should type in the current parent's type
environment. Note that this is not always the case for threads, see for example
SIMPLE in the languages tutorial, but it works here for our simpler IMP++.
From a typing perspective, the &&
construct is strict in both its arguments;
its short-circuit (concrete) semantics is irrelevant for its (static) type
system. Similarly, both the conditional and the while loop are strict
constructs when regarded through the typing lenses.
Finally, the sequential composition is now sequentially strict! Indeed,
statements are now going to reduce to their type, stmt
, and it is critical
for sequential composition to type its argument statements left-to-right;
for example, imagine that the second argument is a variable declaration (whose
type semantics will modify the type environment).
We continue by defining the new results of computations, that is, the actual
types. In this simple imperative language, we only have a few constant types:
int
, bool
, string
, block
and stmt
.
We next define the new configuration, which is actually quite simple. Besides
the <k/>
cell, all we need is a type environment cell, <tenv/>
, which will
hold a map from identifiers to their types. A type environment is therefore
like a state in the abstract domain of type values.
Let us next modify the semantic rules, turning them into a type system. In
short, the idea is to reduce the basic values to their types, and then have a
rule for each language construct reducing it to its result type whenever its
arguments have the expected types.
We write the rules in the order given by the syntax declarations, to make
sure we do not forget any construct.
Integers reduce to their type, int
.
So do the strings.
Variables are now looked up in the type environment and reduced to their type
there. Since we only declare integer variables in IMP++, their type in tenv
will always be int
. Nevertheless, we write the rule generically, so that we
would not have to change it later if we add other type declarations to IMP++.
Note that we reject programs which lookup undeclared variables. Rejection,
in this case, means rewriting getting stuck.
Variable increment types to int
, provided the variable has type int
.
Read types to int
, because we only allow integer input.
Division is only allowed on integers, so it rewrites to int
provided that its
arguments rewrite to int
. Note, however, that in order to write int / int
,
we have to explicitly add int
to the syntax of arithmetic expressions.
Otherwise, the K parser rightfully complains, because /
was declared on
arithmetic expressions, not on types. One simple and generic way to allow
types to appear anywhere, is to define Type
as a syntactic subcategory of all
the other syntactic categories. Let's do it on a by-need basis, though.
Addition is overloaded, so we add two typing rules for it: one for integers
and another for strings.
As discussed, spawn
types to stmt
provided that its argument types to
block
.
The assignment construct was strict(2)
; its typing policy is that the declared
type of X
should be identical to the type of the assigned value. Like for
lookup, we define this rule more generically than needed for IMP++, for any
type, not only for int
.
The typing rules for Boolean expression constructs are in the same spirit.
Note that we need only one rule for &&
.
The typing of blocks is a bit trickier. First, note that we still need to
recover the environment after the block is typed, because we do not want the
block-local variables to be visible in the outer type environment. We recover
the type environment only after the block-enclosed statements type; moreover,
we also opportunistically yield a block
type on the computation when we
discard the type environment recovery item. To account for the fact that the
block-enclosed statement can itself be a block (e.g., {{S}}
), we would need an
additional rule. Since we do not like repetition, we instead group the types
block
and stmt
into one syntactic category, BlockOrStmtType
, and now we
can have only one rule. We also include BlockOrStmtType
in Type
, as a
replacement for the two basic types.
The expression statement types as expected. Recall that we only allow
arithmetic expressions, which type to int
, to be used as statements in IMP++.
The conditional was declared strict
in all its arguments. Its typing policy
is that its first argument types to bool
and its two branches to block
.
If that is the case, then it yields a stmt
type.
For while
, its first argument should type to bool
and its second to block
.
Variable declarations add new bindings to the type environment. Recall that
we can only declare variables of integer type in IMP++.
The typing policy of print
is that it can only print integer or string values,
and in that case it types to stmt
. Like for BlockOrStmtType
, to avoid
having two similar rules, one for int
and another for string
, we prefer to
introduce an additional syntactic category, PrintableType
, which includes both
int
and string
types.
halt
types to stmt
; so its subsequent code is also typed.
join
types to stmt
, provided that its argument types to int
.
Sequential composition was declared as a whitespace-separated sequentially
strict list. Its typing policy is that all the statements in the list should
type to stmt
or block
in order for the list to type to stmt
. Since
lists are maintained internally as cons-lists, this is probably the simplest
way to do it:
rule .Stmts => stmt rule _:BlockOrStmtType Ss => Ss
Note that the first rule, which types the empty sequence of statements to stmt
,
is needed anyway, to type empty blocks {}
(together with the block rule).
kompile
imp.k
and krun
all the programs in Part 4 of the tutorial. They
should all type to stmt
.
In the next lesson we will define a substitution-based type system for LAMBDA.
Go to Lesson 2, Type Systems: Substitution-Based Higher-Order Type Systems.
In this lesson you learn how to define a substitution-based type system for
a higher-order language, namely the LAMBDA language defined in Part 1 of the
tutorial.
Let us copy the definition of LAMBDA from Part 1 of the tutorial, Lesson 8.
We are going to modify it into a type systems for LAMBDA.
Before we start, it is important to clarify an important detail, namely that
our type system will yield a type checker when executed, not a type
inferencer. In particular, we are going to change the LAMBDA syntax
to allow us to associate a type to each declared variable. The
constructs which declare variables are lambda
, let
, letrec
and mu
.
The syntax of all these will therefore change.
Since here we are not interested in a LAMBDA semantics anymore, we take the
freedom to eliminate the Val
syntactic category, our previous results.
Our new results are going to be the types, because programs will now reduce
to their types.
As explained, the syntax of the lambda
construct needs to change, to also
declare the type of the variable that it binds. We add the new syntactic
category Type
, with the following constructs: int
, bool
, the function
type (which gives it its higher-order status), and parentheses as bracket.
Also, we make types our K results.
We are now ready to define the typing rules.
Let us start with the typing rule for lambda abstraction: lambda X : T . E
types to the function type T -> T'
, where T'
is the type obtained by further
typing E[T/X]
. This can be elegantly achieved by reducing the lambda
abstraction to T -> E[T/X]
, provided that we extend the function type construct
to take expressions, not only types, as arguments, and to be strict.
This can be easily achieved by redeclaring it as a strict expression construct
(strictness in the second argument would suffice in this example, but it is
more uniform to define it strict overall).
The typing rule for application is as simple as it can get: (T1->T2) T1 => T2
.
Let us now give the typing rules of arithmetic and Boolean expression
constructs. First, let us get rid of Val
. Second, rewrite each value to its
type, similarly to the type system for IMP++ in the previous lesson. Third,
replace each semantic rule by its typing rule. Fourth, make sure you
do not forget to subsort Type
to Exp
, so your rules above will parse.
The typing policy of the conditional statement is that its first argument
should type to bool
and its other two arguments should type to the same type
T
, which will also be the result type of the conditional. So we make the
conditional construct strict
in all its three arguments and we write the
obvious rule: if bool then T:Type else T => T
. We want a runtime check that
the latter arguments are actually typed, so we write T:Type
.
There is nothing special about let
, except that we have to make sure we
change its syntax to account for the type of the variable that it binds.
This rule is a macro, so the let
is desugared statically.
Similarly, the syntax of letrec
and mu
needs to change to account for the
type of the variable that they bind. The typing of letrec
remains based on
its desugaring to mu
; we have to make sure the types are also included now.
The typing policy of mu
is that its body should type to the same type T
of
its variable, which is also the type of the entire mu
expression. This can
be elegantly achieved by rewriting it to (T -> T) E[T/X]
. Recall that
application is strict, so E[T/X]
will be eventually reduced to its type.
Then the application types correctly only if that type is also T
, and in
that case the result type will also be T
.
kompile
and krun
some programs. You can, for example, take the LAMBDA
programs from the first tutorial, modify them by adding types to their
variable declarations, and then type check them using krun
.
In the next lesson we will discuss an environment-based type system
for LAMBDA.
Go to Lesson 3, Type Systems: Environment-Based Higher-Order Type Systems.
In this lesson you learn how to define an environment-based type system for
a higher-order language, namely the LAMBDA language defined in Part 1 of the
tutorial.
The simplest and fastest way to proceed is to copy the substitution-based
type system of LAMBDA from the previous lesson and modify it into an
environment-based one. A large portion of the substitution-based definition
will remain unchanged. We only have to modify the rules that use
substitution.
We do not need the substitution anymore, so we can remove the require and
import statements. The syntax of types and expressions stays unchanged, but
we can now remove the binder
tag of lambda.
Like in the type system of IMP++ in Lesson 1, we need a configuration that
contains, besides the <k/>
cell, a <tenv/>
cell that will hold the type
environment.
In an environment-based definition, unlike in a substitution-based one, we
need to lookup variables in the environment. So let us start with the
type lookup rule:
rule <k> X:Id => T ...</k> <tenv>... X |-> T ...</k>
The type environment is populated by the semantic rule of lambda
:
rule <k> lambda X : T . E => (T -> E) ~> Rho ...</k> <tenv> Rho => Rho[X <- T] </tenv>
So X
is bound to its type T
in the type environment, and then T -> E
is scheduled for processing. Recall that the arrow type construct has been
extended into a strict expression construct, so E
will be eventually reduced
to its type. Like in other environment-based definitions, we need to make
sure that we recover the type environment after the computation in the scope
of the declared variable terminates.
The typing rule of application does not change, so it stays as elegant as it
was in the substitution-based definition:
rule (T1 -> T2) T1 => T2
So do the rules for arithmetic and Boolean constructs, and those for the
if
, and let
, and letrec
.
The mu
rule needs to change, because it was previously defined using
substitution. We modify it in the same spirit as we modified the lambda
rule: bind X
to its type in the environment, schedule its body for typing
in its right context, and then recover the type environment.
Finally, we give the semantics of environment recovery, making sure
the environment is recovered only after the preceding computation is
reduced to a type:
rule
The changes that we applied to the substitution-based definition were
therefore quite systematic: each substitution invocation was replaced with
an appropriate type environment update/recovery.
Go to Lesson 4, Type Systems: A Naive Substitution-Based Type Inferencer.
In this lesson you learn how to define a naive substitution-based type
inferencer for a higher-order language, namely the LAMBDA language
defined in Part 1 of the tutorial.
Unlike in the type checker defined in Lessons 2 and 3, where we had to
associate a type with each declared variable, a type inferencer
attempts to infer the types of all the variables from the way those
variables are used. Let us take a look at this program, say plus.lambda
:
lambda x . lambda y . x + y
Since x
and y
are used in an integer addition context, we can infer
that they must have the type int
and the result of the addition is
also an int
, so the type of the entire expression is int -> int -> int
.
Similarly, the program if.lambda
lambda x . lambda y . lambda z . if x then y else z
can only make sense when x
has type bool
and y
and z
have the same
type, say t
, in which case the type of the entire expression is
bool -> t -> t -> t
. Since the type t
can be anything, we say that
the type of this expression is polymorphic. That means that the code
above can be used in different contexts, where t
can be an int
, a
bool
, a function type int -> int
, and so on.
In the identity.lambda
program
let f = lambda x . x in f 1
f
has such a polymorphic type, which is then applied to an integer,
so this program is type-safe and its type is int
.
A typical polymorphic expression is the composition
lambda f . lambda g . lambda x . g (f x)
which has the type (t1 -> t2) -> (t2 -> t3) -> (t1 -> t3)
, polymorphic
in 3 types.
Let us now define our naive type inferencer and then we discuss more
examples. The idea is quite simple: we conceptually do the same
operations like we did within the type checker defined in Lesson 2,
with two important differences:
Let us start with the syntax, which is essentially identical to that
of the type checker in Lesson 2, except that bound variables are not
declared a type anymore. Also, to keep things more compact, we put
all the Exp
syntax declarations in one syntax declaration this time.
Before we modify the rules, let us first define our machinery for
adding and solving constraints. First, we require and import the
unification procedure. We do not discuss unification here, but if you
are interested you can consult the unification.k
files under
k-distribution/include/kframework/builtin, which contains our current generic
definition of unification, which is written also in K. The generic unification
provides a sort, Mgu
, for most-general-unifier, an operation
updateMgu(Mgu,T1,T2)
which updates Mgu
with additional constraints
generated by forcing the terms T1
and T2
to be equal, and an operation
applyMgu(Mgu,T)
which applies Mgu
to term T
. For our use
of unification here, we do not even need to know how Mgu
terms are
represented internally.
We define a K item construct, =
, which takes two Type
terms and
enforces them to be equal by means of updating the current Mgu
.
Once the constraints are added to the Mgu
, the equality dissolves
itself. With this semantics of =
in mind, we can now go ahead and
modify the rules of the type checker systematically into rules
for a type inferencer. The changes are self-explanatory and
mechanical: for example, the rule
rule int * int => int
changes into rule
rule T1:Type * T2:Type => T1 = int ~> T2 = int ~> int
generating the constraints that the two arguments of multiplication
have the type int
, and the result type is int
. Recall that each type
equality on the <k/>
cell updates the current Mgu
appropriately and
then dissolves itself; thus, the above says that after imposing the
constraints T1=int
and T2=int
, multiplication yields a type int
.
As mentioned above, since types of variables are not declared anymore,
but inferred, we have to generate a fresh type for each variable at its
declaration time, and then generate appropriately constraints for it.
For example, the type semantics of lambda
and mu
become:
rule lambda X . E => T -> E[T/X] when fresh(T:Type) rule mu X . E => (T -> T) E[T/X] when fresh(T:Type)
that is, we add a condition stating that the previously declared type
is now a fresh one. This type will be further constrained by how the
variable X
is being used within E
.
Interestingly, the previous typing rule for lambda application is not
powerful enough anymore. Indeed, since types are not given anymore,
it may very well be the case that the inferred type of the first
argument of the application construct is not yet a function type
(remember, for example, the program composition.lambda above). What
we have to do is to enforce it to be a function type, by means of
fresh types and constraints. We can introduce a fresh type for the
result of the application, and then write the expected rule as
follows:
rule T1:Type T2:Type => T1 = (T2 -> T) ~> T when fresh(T:Type)
The conditional requires that its first argument is a bool
and its
second and third arguments have the same type, which is also the
result type.
The macros do not change, in particular let
is desugared into lambda
application. We will next see that this is a significant restriction,
because it limits the polymorphism of our type system.
We are done. We have a working type inferencer for LAMBDA.
Let's kompile
it and krun
the programs above. They all work as
expected. Let us also try some additional programs, to push it to its
limits.
First, let us test mu
by means of a letrec
example:
letrec f x = 3 in f
We can also try all the programs that we had in our first tutorial, on
lambda, for example the factorial.imp
program:
letrec f x = if x <= 1 then 1 else (x * (f (x + -1))) in (f 10)
Those programs are simple enough that they should all work as
expected with our naive type inferencer here.
Let us next try to type some tricky programs, which involve more
complex and indirect type constraints.
tricky-1.lambda
:
lambda f . lambda x . lambda y . ( (f x y) + x + (let x = y in x) )
tricky-2.lambda
:
lambda x . let f = lambda y . if true then y else x in (lambda x . f 0)
tricky-3.lambda
:
lambda x . let f = lambda y . if true then x 7 else x y in f
tricky-4.lambda
:
lambda x . let f = lambda x . x in let d = (f x) + 1 in x
tricky-5.lambda
:
lambda x . let f = lambda y . x y in let z = x 0 in f
It is now time to see the limitations of this naive type inferencer.
Consider the program
let id = lambda x . x in if (id true) then (id 1) else (id 2)
Our type inferencer fails graciously with a clash in the <mgu/>
cell
between int
and bool
. Indeed, the desugaring macro of let
turns it
into a lambda
and an application, which further enforce id
to have a
type of the form t -> t
for some fresh type t
. The first use of id
in the condition of if
will then constrain t
to be bool
, while the
other uses in the two branches will enforce t
to be int
. Thus the
clash in the <mgu/>
cell.
Similarly, the program
let id = lambda x . x in id id
yields a different kind of conflict: if id
has type t -> t
, in order
to apply id
to itself it must be the case that its argument, t
, equals
t -> t
. These two type terms cannot be unified because there is a
circular dependence on t
, so we get a cycle in the <mgu/>
cell.
Both limitations above will be solved when we change the semantics of
let
later on, to account for the desired polymorphism.
Before we conclude this lesson, let us see one more interesting
example, where the lack of let-polymorphism leads not to a type error,
but to a less generic type:
let f1 = lambda x . x in let f2 = f1 in let f3 = f2 in let f4 = f3 in let f5 = f4 in if (f5 true) then f2 else f3
Our current type inferencer will infer the type bool -> bool
for the
program above. Nevertheless, since all functions f1
, f2
, f3
, f4
, f5
are the identity function, which is polymorphic, we would expect the
entire program to type to the same polymorphic identity function type.
This limitation will be also addressed when we define our
let-polymorphic type inferencer.
Before that, in the next lesson we will show how easily we can turn
the naive substitution-based type inferencer discussed in this lesson
into a similarly naive, but environment-based type inferencer.
Go to Lesson 5, Type Systems: A Naive Environment-Based Type Inferencer.
In this lesson you learn how to define a naive environment-based type
inferencer for a higher-order language. Specifically, we take the
substitution-based type inferencer for LAMBDA defined in Lesson 4 and
turn it into an environment-based one.
Recall from Lesson 3, where we defined an environment-based type
checker for LAMBDA based on the substitution-based one in Lesson 2,
that the transition from a substitution-based definition to an
environment-based one was quite systematic and mechanical: each
substitution occurrence E[T/X]
is replaced by E
, but at the same time
the variable X
is bound to type T
in the type environment. One benefit
of using type environments instead of substitution is that we replace
a linear complexity operation (the substitution) with a constant
complexity one (the variable lookup).
There is not much left to say which has not been already said in
Lesson 3: we remove the unnecessary binder annotations for the
variable binding operations, then add a <tenv/>
cell to the
configuration to hold the type environment, then add a new rule for
variable lookup, and finally apply the transformation of substitutions
E[T/X]
into E
as explained above.
The resulting type inferencer should now work exactly the same way as
the substitution-based one, except, of course, that the resulting
configurations will contain a <tenv/>
cell now.
As sanity check, let us consider two more LAMBDA programs that test
the static scoping nature of the inferencer. We do that because
faulty environment-based definitions often have this problem. The
program
let x = 1 in let f = lambda a . x in let x = true in f 3
should type to int
, not to bool
, and so it does. Similarly, the
program
let y = 0 in letrec f x = if x <= 0 then y else let y = true in f (x + 1) in f 1
should also type to int
, not bool
, and so it does, too.
The type inferencer defined in this lesson has the same limitations,
in terms of polymorphism, as the one in Lesson 4. In the next
lesson we will see how it can be parallelized, and in further lessons
how to make it polymorphic.
Go to Lesson 6, Type Systems: Parallel Type Checkers/Inferencers.
In this lesson you learn how to define parallel type checkers or
inferencers. For the sake of a choice, we will parallelize the one in
the previous lesson, but the ideas are general. We are using the same
idea to define type checkers for other languages in the K tool
distribution, such as SIMPLE and KOOL.
The idea is in fact quite simple. Instead of one monolithic typing
task, we generate many smaller tasks, which can be processed in
parallel. We use the same approach to define parallel semantics as we
used for threads in IMP++ in Part 4 of the tutorial, that is, we add a
cell holding all the parallel tasks, making sure we declare the cell
holding a task with multiplicity *
. For the particular type
inferencer that we chose here, the one in Lesson 5, each task will
hold an expression to type together with a type environment (so it
knows where to lookup its free variables). We have the following
configuration then:
configuration <tasks color="yellow"> <task color="orange" multiplicity="*"> <k color="green"> $PGM:Exp </k> <tenv color="red"> .Map </tenv> </task> </tasks> <mgu color="blue"> .Mgu </mgu>
Now we have to take each typing rule we had before and change it to
yield parallel typing. For example, our rule for typing
multiplication was the following in Lesson 5:
rule T1:Type * T2:Type => T1 = int ~> T2 = int ~> int
Since *
was strict, its two arguments eventually type, and once that
happens the rule above fires. Unfortunately, the strictness of
multiplication makes the typing of the two expressions sequential in
our previous definition. To avoid typing the two expressions
sequentially and instead generating two parallel tasks, we remove the
strict
attribute of multiplication and replace the rule above with the
following:
rule <k> E1 * E2 => int ...</k> <tenv> Rho </tenv> (. => <task> <k> E1 = int </k> <tenv> Rho </tenv> </task> <task> <k> E2 = int </k> <tenv> Rho </tenv> </task>)
Therefore, we generate two tasks for typing E1
and E2
in the same type
environment as the current task, and let the current task continue by
simply optimistically reducing E1*E2
to its expected result type, int
.
If E1
or E2
will not type to int
, then either their corresponding
tasks will get stuck or the <mgu/>
cell will result into a clash or cycle,
so the program will not type overall in spite of the fact that we
allowed the task containing the multiplication to continue. This is
how we get maximum of parallelism in this case.
Before we continue, note that the new tasks hold equalities in them,
where one of its arguments is an expression, while previously the
equality construct was declared to take types. What we want now is
for the equality construct to possibly take any expressions, and first
type them and then generate the type constraint like before. This can
be done very easily by just extending the equality construct to
expressions and declaring it strict
:
syntax KItem ::= Exp "=" Exp [strict]
Unlike before, where we only passed types to the equality construct,
we now need a runtime check that its arguments are indeed types before
we can generate the updateMgu
command:
rule <k> T:Type = T':Type => . ...</k> <mgu> Theta:Mgu => updateMgu(Theta,T,T') </mgu>
Like before, an equality will therefore update the <mgu/>
cell and then
it dissolves itself, letting the <k/>
cell in the corresponding task
empty. Such empty tasks are unnecessary, so they can be erased:
rule <task>... <k> . </k> ...</task> => .
We can now follow the same style as for multiplication to write the
parallel typing rules of the other arithmetic constructs, and even for
the conditional.
To parallelize the typing of lambda we generate two fresh types, one
for the variable and one for the body, and make sure that we generate
the correct type constraint and environment in the body task:
rule <k> lambda X . E => Tx -> Te ...</k> <tenv> TEnv </tenv> (. => <task> <k> E = Te </k> <tenv> TEnv[Tx/X] </tenv> </task>) when fresh(Tx:Type) andBool fresh(Te:Type)
Note that the above also allows us to not need to change and then
recover the environment of the current cell.
For function application we also need to generate two fresh types:
rule <k> E1 E2 => T ...</k> <tenv> Rho </tenv> (. => <task> <k> E1 = T2 -> T </k> <tenv> Rho </tenv> </task> <task> <k> E2 = T2 </k> <tenv> Rho </tenv> </task>) when fresh(T2:Type) andBool fresh(T:Type)
The only rule left is that of mu X . E
. In this case we only need one
fresh type, because X
, E
and mu X . E
have all the same type:
rule <k> mu X . E => T ...</k> <tenv> TEnv </tenv> (. => <task> <k> E = T </k> <tenv> TEnv[T/X] </tenv> </task>) when fresh(T:Type)
We do not need the type environment recovery operation, so we delete it.
We can now kompile
and krun
all the programs that we typed in Lesson 5.
Everything should work.
In this lesson we only aimed at parallelizing the type inferencer in
Lesson 5, not to improve its expressiveness; it still has the same
limitations in terms of polymorphism. The next lessons are dedicated
to polymorphic type inferencers.
Go to Lesson 7, Type Systems: A Naive Substitution-based Polymorphic Type Inferencer.
In this lesson you learn how little it takes to turn a naive monomorphic
type inferencer into a naive polymorphic one, basically only changing
a few characters. In terms of the K framework, you will learn that
you can have complex combinations of substitutions in K, both over
expressions and over types.
Let us start directly with the change. All we have to do is to take
the LAMBDA type inferencer in Lesson 4 and only change the macro
rule let X = E in E' => (lambda X . E') E [macro]
as follows:
rule let X = E in E' => E'[E/X] [macro]
In other words, we are inlining the beta-reduction rule of
lambda-calculus within the original rule. In terms of typing,
the above forces the type inferencer to type E
in place for each
occurrence of X
in E'
. Unlike in the first rule, where X
had to get
one type only which satisfied the constrains of all X
's occurrences in
E'
, we now never associate any type to X
anymore.
Let us kompile
and krun
some examples. Everything that worked with
the type inferencer in Lesson 4 should still work here, although the
types of some programs can now be more general. For example, reconsider
the nested-lets.lambda
program
let f1 = lambda x . x in let f2 = f1 in let f3 = f2 in let f4 = f3 in let f5 = f4 in if (f5 true) then f2 else f3
which was previously typed to bool -> bool
. With the new rule above,
the sequence of lets is iteratively eliminated and we end up with the
program
if (lambda x . x) true then (lambda x . x) else (lambda x . x)
which now types (with both type inferencers) to a type of the form
t -> t
, for some type variable t
, which is more general than the
previous bool -> bool
type that the program typed to in Lesson 4.
We can also now type programs that were not typable before, such as
let id = lambda x . x in if (id true) then (id 1) else (id 2)
and
let id = lambda x . x in id id
Let us also test it on some trickier programs, also not typable
before, such as
let f = lambda x . x in let g = lambda y . f y in g g
which gives us a type of the form t -> t
for some type variable t
,
and as
let f = let g = lambda x . x in let h = lambda x . lambda x . (g g g g) in h in f
which types to t1 -> t2 -> t3 -> t3
for some type variables t1
, t2
, t3
.
Here is another program which was not typable before, which is
trickier than the others above in that a lambda-bound variable appears
free in a let-bound expression:
lambda x . ( let y = lambda z . x in if (y true) then (y 1) else (y (lambda x . x)) )
The above presents no problem now, because once lambda z . x
gets
substituted for y
we get a well-typed expression which yields that x
has the type bool
, so the entire expression types to bool -> bool
.
The cheap type inferencer that we obtained above therefore works as
expected. However, it has two problems which justify a more advanced
solution. First, substitution is typically considered an elegant
mathematical instrument which is not too practical in implementations,
so an implementation of this type inferencer will likely be based on
type environments anyway. Additionally, we mix two kinds of
substitutions in this definition, one where we substitute types and
another where we substitute expressions, which can only make things
harder to implement efficiently. Second, our naive substitution of E
for X
in E'
can yield an exponential explosion in size of the original
program. Consider, for example, the following classic example which
is known to generate a type whose size is exponential in the size of
the program (and is thus used as an argument for why let-polymorphic
type inference is exponential in the worst-case):
let f00 = lambda x . lambda y . x in let f01 = lambda x . f00 (f00 x) in let f02 = lambda x . f01 (f01 x) in let f03 = lambda x . f02 (f02 x) in let f04 = lambda x . f03 (f03 x) in // ... you can add more nested lets here f04
The particular instance of the pattern above generates a type which
has 17 type variables! The desugaring of each let
doubles the size of
the program and of its resulting type. While such programs are little
likely to appear in practice, it is often the case that functions can
be quite complex and large while their type can be quite simple in the
end, so we should simply avoid retyping each function each time it is
used.
This is precisely what we will do next. Before we present the classic
let-polymorphic type inferencer in Lesson 9, which is based on
environments, we first quickly discuss in Lesson 8 an intermediate
step, namely a naive environment-based variant of the inferencer
defined here.
Go to Lesson 8, Type Systems: A Naive Environment-based Polymorphic Type Inferencer.
In this short lesson we discuss how to quickly turn a naive
environment-based monomorphic type inferencer into a naive let-polymorphic
one. Like in the previous lesson, we only need to change a few
characters. In terms of the K framework, you will learn how to have
both environments and substitution in the same definition.
Like in the previous lesson, all we have to do is to take the LAMBDA
type inferencer in Lesson 5 and only change the rule
rule let X = E in E' => (lambda X . E') E
as follows:
rule let X = E in E' => E'[E/X]
The reasons why this works have already been explained in the previous
lesson, so we do not repeat them here.
Since our new let rule uses substitution, we have to require the
substitution module at the top and also import SUBSTITUTION in the
current module, besides the already existing UNIFICATION.
Everything which worked with the type inferencer in Lesson 7 should
also work now. Let us only try the exponential type example,
let f00 = lambda x . lambda y . x in let f01 = lambda x . f00 (f00 x) in let f02 = lambda x . f01 (f01 x) in let f03 = lambda x . f02 (f02 x) in let f04 = lambda x . f03 (f03 x) in f04
As expected, this gives us precisely the same type as in Lesson 7.
So the only difference between this type inferencer and the one in
Lesson 7 is that substitution is only used for LAMBDA-to-LAMBDA
transformations, but not for infusing types within LAMBDA programs.
Thus, the syntax of LAMBDA programs is preserved intact, which some
may prefer. Nevertheless, this type inferencer is still expensive and
wasteful, because the let-bound expression is typed over and over
again in each place where the let-bound variable occurs.
In the next lesson we will discuss a type inferencer based on the
classic Damas-Hindley-Milner type system, which maximizes the reuse of
typing work by means of parametric types.
Go to Lesson 9, Type Systems: Let-Polymorphic Type Inferencer (Damas-Hindley-Milner).
In this lesson we discuss a type inferencer based on what we call today
the Damas-Hindley-Milner type system, which is at the core of many
modern functional programming languages. The first variant of it was
proposed by Hindley in 1969, then, interestingly, Milner rediscovered
it in 1978 in the context of the ML language. Damas formalized it as
a type system in his PhD thesis in 1985. More specifically, our type
inferencer here, like many others as well as many implementations of
it, follows more closely the syntax-driven variant proposed by Clement
in 1987.
In terms of K, we will see how easily we can turn one definition which
is considered naive (our previous type inferencer in Lesson 8) into a
definition which is considered advanced. All we have to do is to
change one existing rule (the rule of the let binder) and to add a new
one. We will also learn some new predefined features of K, which make
the above possible.
The main idea is to replace the rule
rule let X = E in E' => E'[E/X]
which creates potentially many copies of E
within E'
with a rule
which types E
once and then reuses that type in each place where X
occurs free in E'
. The simplest K way to type E
is to declare the
let construct strict(2)
. Now we cannot simply bind X
to the type
of E
, because we would obtain a variant of the naive type inferencer
we already discussed, together with its limitations, in Lesson 5 of this
tutorial. The trick here is to parameterize the type of E
in all its
unconstrained fresh types, and then create fresh copies of those
parameters in each free occurrence of X
in E'
.
Let us discuss some examples, before we go into the technical details.
Consider the first let-polymorphic example which failed to be typed
with our first naive type-inferencer:
let id = lambda x . x in if (id true) then (id 1) else (id 2)
When typing lambda x . x
, we get a type of the form t -> t
, for some
fresh type t
. Instead of assigning this type to id
as we did in the
naive type inferencers, we now first parametrize this type in its
fresh variable t
, written
(forall t) t -> t
and then bind id
to this parametric type. The intuition for the
parameter is that it can be instantiated with any other type, so this
parametric type stands, in fact, for infinitely many non-parametric
types. This is similar to what happens in formal logic proof systems,
where rule schemas stand for infinitely many concrete instances of
them. For this reason, parametric types are also called type schemas.
Now each time id
is looked up within the let-body, we create a fresh
copy of the parameter t
, which can this way be independently
constrained by each local context. Let's suppose that the three id
lookups yield the types t1 -> t1
, t2 -> t2
, and respectively t3 -> t3
.
Then t1
will be constrained to be bool
, and t2
and t3
to be int
,
so we can now safely type the program above to int
.
Therefore, a type schema comprises a summary of all the typing work
that has been done for typing the corresponding expression, and an
instantiation of its parameters with fresh copies represents an
elegant way to reuse all that typing work.
There are some subtleties regarding what fresh types can be made
parameters. Let us consider another example, discussed as part of
Lesson 7 on naive let-polymorphism:
lambda x . ( let y = lambda z . x in if (y true) then (y 1) else (y (lambda x . x)) )
This program should type to bool -> bool
, as explained in Lesson 7.
The lambda
construct will bind x
to some fresh type tx
. Then the
let-bound expression lambda z . x
types to tz -> tx
for some
additional fresh type tz
. The question now is what should the
parameters of this type be when we generate the type schema? If we
naively parameterize in all fresh variables, that is in both tz
and
tx
obtaining the type schema (forall tz,tx) tz -> tx
, then there will
be no way to infer that the type of x
, tx
, must be a bool
! The
inferred type of this expression would then wrongly be tx -> t
for
some fresh types tx
and t
. That's because the parameters are replaced
with fresh copies in each occurrence of y
, and thus their relationship
to the original x
is completely lost. This tells us that we cannot
parameterize in all fresh types that appear in the type of the
let-bound expression. In particular, we cannot parameterize in those
which some variables are already bound to in the current type
environment (like x
is bound to tx
in our example above).
In our example, the correct type schema is (forall tz) tz -> tx
,
which now allows us to correctly infer that tx
is bool
.
Let us now discuss another example, which should fail to type:
lambda x . let f = lambda y . x y in if (f true) then (f 1) else (f 2)
This should fail to type because lambda y . x y
is equivalent to x
,
so the conditional imposes the conflicting constraints that x
should be
a function whose argument is either a bool
or an int
. Let us try to
type it using our currently informal procedure. Like in the previous
example, x
will be bound to a fresh type tx
. Then the let-bound
expression types to ty -> tz
with ty
and tz
fresh types, adding also
the constraint tx = ty -> tz
. What should the parameters of this type
be? If we ignore the type constraint and simply make both ty
and tz
parameters because no variable is bound to them in the type
environment (indeed, the only variable x
in the type environment is
bound to tx
), then we can wrongly type this program to tx -> tz
following a reasoning similar to the one in the example above.
In fact, in this example, none of ty
and tz
can be parameters, because
they are constrained by tx
.
The examples above tell us two things: first, that we have to take the
type constraints into account when deciding the parameters of the
schema; second, that after applying the most-general-unifier solution
given by the type constraints everywhere, the remaining fresh types
appearing anywhere in the type environment are consequently constrained
and cannot be turned into parameters. Since the type environment can in
fact also hold type schemas, which already bind some types, we only need
to ensure that none of the fresh types appearing free anywhere in the
type environment are turned into parameters of type schemas.
Thanks to generic support offered by the K tool, we can easily achieve
all the above as follows.
First, add syntax for type schemas:
syntax TypeSchema ::= "(" "forall" Set ")" Type [binder]
The definition below will be given in such a way that the Set
argument
of a type schema will always be a set of fresh types. We also declare
this construct to be a binder
, so that we can make use of the generic
free variable function provided by the K tool.
We now replace the old rule for let
rule let X = E in E' => E'[E/X]
with the following rule:
rule <k> let X = T:Type in E => E ~> tenv(TEnv) ...</k> <mgu> Theta:Mgu </mgu> <tenv> TEnv => TEnv[(forall freeVariables(applyMgu(Theta, T)) -Set freeVariables(applyMgu(Theta, values TEnv)) ) applyMgu(Theta, T) / X] </tenv>
So the type T
of E
is being parameterized and then bound to X
in the
type environment. The current mgu Theta
, which comprises all the type
constraints accumulated so far, is applied to both T
and the types in
the type environment. The remaining fresh types in T
which do not
appear free in the type environment are then turned into type parameters.
The function freeVariables
returns, as expected, the free variables of
its argument as a Set
; this is why we declared the type schema to be a
binder above.
Now a LAMBDA variable in the type environment can be bound to either a
type or a type schema. In the first case, the previous rule we had
for variable lookup can be reused, but we have to make sure we check
that T
there is of sort Type
(adding a sort membership, for example).
In the second case, as explained above, we have to create fresh copies
of the parameters. This can be easily achieved with another
predefined K function, as follows:
rule <k> X:Id => freshVariables(Tvs,T) ...</k> <tenv>... X |-> (forall Tvs) T ...</tenv>
Indeed, freshVariables
takes a set of variables and a term, and returns the
same term but with each of the given variables replaced by a fresh copy.
The operations freeVariables
and freshVariables
are useful in many K
definitions, so they are predefined in module substitution.k
.
Our definition of this let-polymorphic type inferencer is now
complete. To test it, kompile
it and then krun
all the LAMBDA
programs discussed since Lesson 4. They should all work as expected.
Here we present several "real-world" language examples. These languages
demonstrate many of the features you would expect to find in a full-fledged
programming language.
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of the untyped SIMPLE language.
SIMPLE is intended to be a pedagogical and research language that captures
the essence of the imperative programming paradigm, extended with several
features often encountered in imperative programming languages.
A program consists of a set of global variable declarations and
function definitions. Like in C, function definitions cannot be
nested and each program must have one function called main
,
which is invoked when the program is executed. To make it more
interesting and to highlight some of K's strengths, SIMPLE includes
the following features in addition to the conventional imperative
expression and statement constructs:
Multidimensional arrays and array references. An array evaluates
to an array reference, which is a special value holding a location (where
the elements of the array start) together with the size of the array;
the elements of the array can be array references themselves (particularly
when the array is multi-dimensional). Array references are ordinary values,
so they can be assigned to variables and passed/received by functions.
Functions and function values. Functions can have zero or
more parameters and can return abruptly using a return
statement.
SIMPLE follows a call-by-value parameter passing style, with static scoping.
Function names evaluate to function abstractions, which hereby become ordinary
values in the language, same like the array references.
Blocks with locals. SIMPLE variables can be declared
anywhere, their scope being from the place where they are declared
until the end of the most nested enclosing block.
Input/Output. The expression read()
evaluates to the
next value in the input buffer, and the statement write(e)
evaluates e
and outputs its value to the output buffer. The
input and output buffers are lists of values.
Exceptions. SIMPLE has parametric exceptions (the value thrown as
an exception can be caught and bound).
Concurrency via dynamic thread creation/termination and
synchronization. One can spawn a thread to execute any statement.
The spawned thread shares with its parent its environment at creation time.
Threads can be synchronized via a join command which blocks the current thread
until the joined thread completes, via re-entrant locks which can be acquired
and released, as well as through rendezvous commands.
Like in many other languages, some of SIMPLE's constructs can be
desugared into a smaller set of basic constructs. We do that at the end
of the syntax module, and then we only give semantics to the core constructs.
Note: This definition is commented slightly more than others, because it is
intended to be one of the first non-trivial definitions that the new
user of K sees. We recommend the beginner user to first check the
language definitions discussed in the K tutorial.
module SIMPLE-UNTYPED-SYNTAX imports DOMAINS-SYNTAX
We start by defining the SIMPLE syntax. The language constructs discussed
above have the expected syntax and evaluation strategies. Recall that in K
we annotate the syntax with appropriate strictness attributes, thus giving
each language construct the desired evaluation strategy.
Recall from the K tutorial that identifiers are builtin and come under the
syntactic category Id
. The special identifier for the function
main
belongs to all programs, and plays a special role in the semantics,
so we declare it explicitly. This would not be necessary if the identifiers
were all included automatically in semantic definitions, but that is not
possible because of parsing reasons (e.g., K variables used to match
concrete identifiers would then be ambiguously parsed as identifiers). They
are only included in the parser generated to parse programs (and used by the
kast
tool). Consequently, we have to explicitly declare all the
concrete identifiers that play a special role in the semantics, like
main
below.
syntax Id ::= "main" [token]
There are two types of declarations: for variables (including arrays) and
for functions. We are going to allow declarations of the form
var x=10, a[10,10], y=23;
, which is why we allow the var
keyword to take a list of expressions. The non-terminals used in the two
productions below are defined shortly.
syntax Stmt ::= "var" Exps ";" | "function" Id "(" Ids ")" Block
The expression constructs below are standard. Increment (++
) takes
an expression rather than a variable because it can also increment an array
element. Recall that the syntax we define in K is what we call the syntax
of the semantics: while powerful enough to define non-trivial syntaxes
(thanks to the underlying SDF technology that we use), we typically refrain
from defining precise syntaxes, that is, ones which accept precisely the
well-formed programs (that would not be possible anyway in general). That job
is deferred to type systems, which can also be defined in K. In other words,
we are not making any effort to guarantee syntactically that only variables
or array elements are passed to the increment construct, we allow any
expression. Nevertheless, we will only give semantics to those, so expressions
of the form ++5
, which parse (but which will be rejected by our type
system in the typed version of SIMPLE later), will get stuck when executed.
Arrays can be multidimensional and can hold other arrays, so their
lookup operation takes a list of expressions as argument and applies to an
expression (which can in particular be another array lookup), respectively.
The construct sizeOf
gives the size of an array in number of elements
of its first dimension. Note that almost all constructs are strict. The only
constructs which are not strict are the increment (since its first argument
gets updated, so it cannot be evaluated), the input read which takes no
arguments so strictness is irrelevant for it, the logical and and or constructs
which are short-circuited, the thread spawning construct which creates a new
thread executing the argument expression and return its unique identifier to
the creating thread (so it cannot just evaluate its argument in place), and the
assignment which is only strict in its second argument (for the same reason as
the increment).
syntax Exp ::= Int | Bool | String | Id | "(" Exp ")" [bracket] | "++" Exp > Exp "[" Exps "]" [strict] > Exp "(" Exps ")" [strict] | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict(1), left] | Exp "||" Exp [strict(1), left] > "spawn" Block > Exp "=" Exp [strict(2), right]
We also need comma-separated lists of identifiers and of expressions.
Moreover, we want them to be strict, that is, to evaluate to lists of results
whenever requested (e.g., when they appear as strict arguments of
the constructs above).
syntax Ids ::= List{Id,","} [overload(Exps)] syntax Exps ::= List{Exp,","} [overload(Exps), strict] // automatically hybrid now syntax Exps ::= Ids syntax Val syntax Vals ::= List{Val,","} [overload(Exps)] syntax Bottom syntax Bottoms ::= List{Bottom,","} [overload(Exps)] syntax Ids ::= Bottoms
Most of the statement constructs are standard for imperative languages.
We syntactically distinguish between empty and non-empty blocks, because we
chose Stmts
not to be a (;
-separated) list of
Stmt
. Variables can be declared anywhere inside a block, their scope
ending with the block. Expressions are allowed to be used for their side
effects only (followed by a semicolon ;
). Functions are allowed
to abruptly return. The exceptions are parametric, i.e., one can throw a value
which is bound to the variable declared by catch
. Threads can be
dynamically created and terminated, and can synchronize with join
,
acquire
, release
and rendezvous
. Note that the
strictness attributes obey the intended evaluation strategy of the various
constructs. In particular, the if-then-else construct is strict only in its
first argument (the if-then construct will be desugared into if-then-else),
while the loop constructs are not strict in any arguments. The print
statement construct is variadic, that is, it takes an arbitrary number of
arguments.
syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict(1)] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "return" Exp ";" [strict] | "return" ";" [macro] | "print" "(" Exps ")" ";" [strict] // NOTE: print strict allows non-deterministic evaluation of its arguments // Either keep like this but document, or otherwise make Exps seqstrict. // Of define and use a different expression list here, which is seqstrict. | "try" Block "catch" "(" Id ")" Block | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict]
The reason we allow Stmts
as the first argument of for
instead of Stmt
is because we want to allow more than one statement
to be executed when the loop is initialized. Also, as seens shorly, macros
may expand one statement into more statements; for example, an initialized
variable declaration statement var x=0;
desugars into two statements,
namely var x; x=0;
, so if we use Stmt
instead of Stmts
in the production of for
above then we risk that the macro expansion
of statement var x=0;
happens before the macro expansion of for
,
also shown below, in which case the latter would not apply anymore because
of syntactic mismatch.
syntax Stmt ::= Stmt Stmt [right] // I wish I were able to write the following instead, but confuses the parser. // // syntax Stmts ::= List{Stmt,""} // syntax Top ::= Stmt | "function" Id "(" Ids ")" Block // syntax Pgm ::= List{Top,""} // // With that, I could have also eliminated the empty block
This part desugars some of SIMPLE's language constructs into core ones.
We only want to give semantics to core constructs, so we get rid of the
derived ones before we start the semantics. All desugaring macros below are
straightforward.
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S} => {Start while (Cond) {S Step;}} rule for(Start Cond; Step) {} => {Start while (Cond) {Step;}} rule var E1:Exp, E2:Exp, Es:Exps; => var E1; var E2, Es; rule var X:Id = E; => var X; X = E;
For the semantics, we can therefore assume from now on that each
conditional has both branches, that there are only while
loops, and
that each variable is declared alone and without any initialization as part of
the declaration.
endmodule module SIMPLE-UNTYPED imports SIMPLE-UNTYPED-SYNTAX imports DOMAINS
Before one starts adding semantic rules to a K definition, one needs to
define the basic semantic infrastructure consisting of definitions for
values
and configuration
. As discussed in the definitions
in the K tutorial, the values are needed to know when to stop applying
the heating rules and when to start applying the cooling rules corresponding
to strictness or context declarations. The configuration serves as a backbone
for the process of configuration abstraction which allows users to only
mention the relevant cells in each semantic rule, the rest of the configuration
context being inferred automatically. Although in some cases the configuration
could be automatically inferred from the rules, we believe that it is very
useful for language designers/semanticists to actually think of and design
their configuration explicitly, so the current implementation of K requires
one to define it.
We here define the values of the language that the various fragments of
programs evaluate to. First, integers and Booleans are values. As discussed,
arrays evaluate to special array reference values holding (1) a location from
where the array's elements are contiguously allocated in the store, and
(2) the size of the array. Functions evaluate to function values as
λ-abstractions (we do not need to evaluate functions to closures
because each function is executed in the fixed global environment and
function definitions cannot be nested). Like in IMP and other
languages, we finally tell the tool that values are K results.
syntax Val ::= Int | Bool | String | array(Int,Int) | lambda(Ids,Stmt) syntax Exp ::= Val syntax Exps ::= Vals syntax Vals ::= Bottoms syntax KResult ::= Val | Vals // TODO: should not need this
The inclusion of values in expressions follows the methodology of
syntactic definitions (like, e.g., in SOS): extend the syntax of the language
to encompass all values and additional constructs needed to give semantics.
In addition to that, it allows us to write the semantic rules using the
original syntax of the language, and to parse them with the same (now extended
with additional values) parser. If writing the semantics directly on the K
AST, using the associated labels instead of the syntactic constructs, then one
would not need to include values in expressions.
The K configuration of SIMPLE consists of a top level cell, T
,
holding a threads
cell, a global environment map cell genv
mapping the global variables and function names to their locations, a shared
store map cell store
mapping each location to some value, a set cell
busy
holding the locks which have been acquired but not yet released
by threads, a set cell terminated
holding the unique identifiers of
the threads which already terminated (needed for join
), input
and output
list cells, and a nextLoc
cell holding a natural
number indicating the next available location. Unlike in the small languages
in the K tutorial, where we used the fresh predicate to generate fresh
locations, in larger languages, like SIMPLE, we prefer to explicitly manage
memory. The location counter in nextLoc
models an actual physical
location in the store; for simplicity, we assume arbitrarily large memory and
no garbage collection. The threads
cell contains one thread
cell for each existing thread in the program. Note that the thread cell has
multiplicity *
, which means that at any given moment there could be zero,
one or more thread
cells. Each thread
cell contains a
computation cell k
, a control
cell holding the various
control structures needed to jump to certain points of interest in the program
execution, a local environment map cell env
mapping the thread local
variables to locations in the store, and finally a holds
map cell
indicating what locks have been acquired by the thread and not released so far
and how many times (SIMPLE's locks are re-entrant). The control
cell
currently contains only two subcells, a function stack fstack
which
is a list and an exception stack xstack
which is also a list.
One can add more control structures in the control
cell, such as a
stack for break/continue of loops, etc., if the language is extended with more
control-changing constructs. Note that all cells except for k
are
also initialized, in that they contain a ground term of their corresponding
sort. The k
cell is initialized with the program that will be passed
to the K tool, as indicated by the $PGM
variable, followed by the
execute
task (defined shortly).
// the syntax declarations below are required because the sorts are // referenced directly by a production and, because of the way KIL to KORE // is implemented, the configuration syntax is not available yet // should simply work once KIL is removed completely // check other definitions for this hack as well syntax ControlCell syntax ControlCellFragment configuration <T color="red"> <threads color="orange"> <thread multiplicity="*" type="Map" color="yellow"> <id color="pink"> -1 </id> <k color="green"> $PGM:Stmt ~> execute </k> //<br/> // TODO(KORE): support latex annotations #1799 <control color="cyan"> <fstack color="blue"> .List </fstack> <xstack color="purple"> .List </xstack> </control> //<br/> // TODO(KORE): support latex annotations #1799 <env color="violet"> .Map </env> <holds color="black"> .Map </holds> </thread> </threads> //<br/> // TODO(KORE): support latex annotations #1799 <genv color="pink"> .Map </genv> <store color="white"> .Map </store> <busy color="cyan"> .Set </busy> <terminated color="red"> .Set </terminated> //<br/> // TODO(KORE): support latex annotations #1799 <input color="magenta" stream="stdin"> .List </input> <output color="brown" stream="stdout"> .List </output> <nextLoc color="gray"> 0 </nextLoc> </T>
We start by defining the semantics of declarations (for variables,
arrays and functions).
The SIMPLE syntax was desugared above so that each variable is
declared alone and its initialization is done as a separate statement.
The semantic rule below matches resulting variable declarations of the
form var X;
on top of the k
cell
(indeed, note that the k
cell is complete, or round, to the
left, and is torn, or ruptured, to the right), allocates a fresh
location L
in the store which is initialized with a special value
⊥
(indeed, the unit .
, or nothing, is matched anywhere
in the map ‒note the tears at both sides‒ and replaced with the
mapping L ↦ ⊥
), and binds X
to L
in the local
environment shadowing previous declarations of X
, if any.
This possible shadowing of X
requires us to therefore update the
entire environment map, which is expensive and can significantly slow
down the execution of larger programs. On the other hand, since we know
that L
is not already bound in the store, we simply add the binding
L ↦ ⊥
to the store, thus avoiding a potentially complete
traversal of the the store map in order to update it. We prefer the approach
used for updating the store whenever possible, because, in addition to being
faster, it offers more true concurrency than the latter; indeed, according
to the concurrent semantics of K
, the store is not frozen while
L ↦ ⊥
is added to it, while the environment is frozen during the
update operation Env[L/X]
. The variable declaration command is
also removed from the top of the computation cell and the fresh location
counter is incremented. The undefined symbol ⊥
added in the store
is of sort KItem
, instead of Val
, on purpose; this way, the
store lookup rules will get stuck when one attempts to lookup an
uninitialized location. All the above happen in one transactional step,
with the rule below. Note also how configuration abstraction allows us to
only mention the needed cells; indeed, as the configuration above states,
the k
and env
cells are actually located within a
thread
cell within the threads
cell, but one needs
not mention these: the configuration context of the rule is
automatically transformed to match the declared configuration
structure.
syntax KItem ::= "undefined" rule <k> var X:Id; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> undefined ...</store> <nextLoc> L => L +Int 1 </nextLoc>
The K semantics of the uni-dimensional array declaration is somehow similar
to the above declaration of ordinary variables. First, note the
context declaration below, which requests the evaluation of the array
dimension. Once evaluated, say to a natural number N
, then
N +Int 1
locations are allocated in the store for
an array of size N
, the additional location (chosen to be the first
one allocated) holding the array reference value. The array reference
value array(L,N)
states that the array has size N
and its
elements are located contiguously in the store starting with location
L
. The operation L … L' ↦ V
, defined at the end of this
file in the auxiliary operation section, initializes each location in
the list L … L'
to V
. Note that, since the dimensions of
array declarations can be arbitrary expressions, this virtually means
that we can dynamically allocate memory in SIMPLE by means of array
declarations.
context var _:Id[HOLE]; rule <k> var X:Id[N:Int]; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> array(L +Int 1, N) (L +Int 1) ... (L +Int N) |-> undefined ...</store> <nextLoc> L => L +Int 1 +Int N </nextLoc> requires N >=Int 0
SIMPLE allows multi-dimensional arrays. For semantic simplicity, we
desugar them all into uni-dimensional arrays by code transformation.
This way, we only need to give semantics to uni-dimensional arrays.
First, note that the context rule above actually evaluates all the array
dimensions (that's why we defined the expression lists strict!):
Upon evaluating the array dimensions, the code generation rule below
desugars multi-dimensional array declaration to uni-dimensional declarations.
To this aim, we introduce two special unique variable identifiers,
$1
and $2
. The first variable, $1
, iterates
through and initializes each element of the first dimension with an array
of the remaining dimensions, declared as variable $2
:
syntax Id ::= "$1" [token] | "$2" [token] rule var X:Id[N1:Int, N2:Int, Vs:Vals]; => var X[N1]; { for(var $1 = 0; $1 <= N1 - 1; ++$1) { var $2[N2, Vs]; X[$1] = $2; } }
Ideally, one would like to perform syntactic desugarings like the one
above before the actual semantics. Unfortunately, that was not possible in
this case because the dimension expressions of the multi-dimensional array need
to be evaluated first. Indeed, the desugaring rule above does not work if the
dimensions of the declared array are arbitrary expressions, because they can
have side effects (e.g., a[++x,++x]
) and those side effects would be
propagated each time the expression is evaluated in the desugaring code (note
that both the loop condition and the nested multi-dimensional declaration
would need to evaluate the expressions given as array dimensions).
Functions are evaluated to λ-abstractions and stored like any other
values in the store. A binding is added into the environment for the function
name to the location holding its body. Similarly to the C language, SIMPLE
only allows function declarations at the top level of the program. More
precisely, the subsequent semantics of SIMPLE only works well when one
respects this requirement. Indeed, the simplistic context-free parser
generated by the grammar above is more generous than we may want, in that it
allows function declarations anywhere any declaration is allowed, including
inside arbitrary blocks. However, as the rule below shows, we are not
storing the declaration environment with the λ-abstraction value as
closures do. Instead, as seen shortly, we switch to the global environment
whenever functions are invoked, which is consistent with our requirement that
functions should only be declared at the top. Thus, if one declares local
functions, then one may see unexpected behaviors (e.g., when one shadows a
global variable before declaring a local function). The type checker of
SIMPLE, also defined in K (see examples/simple/typed/static
),
discards programs which do not respect this requirement.
rule <k> function F(Xs) S => .K ...</k> <env> Env => Env[F <- L] </env> <store>... .Map => L |-> lambda(Xs, S) ...</store> <nextLoc> L => L +Int 1 </nextLoc>
When we are done with the first pass (pre-processing), the computation
cell k
contains only the token execute
(see the configuration
declaration above, where the computation item execute
was placed
right after the program in the k
cell of the initial configuration)
and the cell genv
is empty. In this case, we have to call
main()
and to initialize the global environment by transferring the
contents of the local environment into it. We prefer to do it this way, as
opposed to processing all the top level declarations directly within the global
environment, because we want to avoid duplication of semantics: the syntax of
the global declarations is identical to that of their corresponding local
declarations, so the semantics of the latter suffices provided that we copy
the local environment into the global one once we are done with the
pre-processing. We want this separate pre-processing step precisely because
we want to create the global environment. All (top-level) functions end up
having their names bound in the global environment and, as seen below, they
are executed in that same global environment; all these mean, in particular,
that the functions "see" each other, allowing for mutual recursion, etc.
syntax KItem ::= "execute" rule <k> execute => main(.Exps); </k> <env> Env </env> <genv> .Map => Env </genv>
We next define the K semantics of all the expression constructs.
When a variable X
is the first computational task, and X
is bound to some
location L
in the environment, and L
is mapped to some value V
in the
store, then we rewrite X
into V
:
rule <k> X:Id => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store>
Note that the rule above excludes reading ⊥
, because ⊥
is not
a value and V
is checked at runtime to be a value.
This is tricky, because we want to allow both ++x
and ++a[5]
.
Therefore, we need to extract the lvalue of the expression to increment.
To do that, we state that the expression to increment should be wrapped
by the auxiliary lvalue
operation and then evaluated. The semantics
of this auxiliary operation is defined at the end of this file. For now, all
we need to know is that it takes an expression and evaluates to a location
value. Location values, also defined at the end of the file, are integers
wrapped with the operation loc
, to distinguish them from ordinary
integers.
context ++(HOLE => lvalue(HOLE)) rule <k> ++loc(L) => I +Int 1 ...</k> <store>... L |-> (I => I +Int 1) ...</store>
There is nothing special about the following rules. They rewrite the
language constructs to their library counterparts when their arguments
become values of expected sorts:
rule I1 + I2 => I1 +Int I2 rule Str1 + Str2 => Str1 +String Str2 rule I1 - I2 => I1 -Int I2 rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2
The equality and inequality constructs reduce to syntactic comparison
of the two argument values (which is what the equality on K
terms does).
rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2
The logical negation is clear, but the logical conjunction and disjunction
are short-circuited:
rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E
Untyped SIMPLE does not check array bounds (the dynamically typed version of
it, in examples/simple/typed/dynamic
, does check for array out of
bounds). The first rule below desugars the multi-dimensional array access to
uni-dimensional array access; recall that the array access operation was
declared strict, so all sub-expressions involved are already values at this
stage. The second rule rewrites the array access to a lookup operation at a
precise location; we prefer to do it this way to avoid locking the store.
The semantics of the auxiliary lookup
operation is straightforward,
and is defined at the end of the file.
// The [anywhere] feature is underused, because it would only be used // at the top of the computation or inside the lvalue wrapper. So it // may not be worth, or we may need to come up with a special notation // allowing us to enumerate contexts for [anywhere] rules. rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs] [anywhere] rule array(L,_)[N:Int] => lookup(L +Int N) [anywhere]
The size of the array is stored in the array reference value, and the
sizeOf
construct was declared strict, so:
rule sizeOf(array(_,N)) => N
Function application was strict in both its arguments, so we can
assume that both the function and its arguments are evaluated to
values (the former expected to be a λ-abstraction). The first
rule below matches a well-formed function application on top of the
computation and performs the following steps atomically: it switches
to the function body followed by return;
(for the case in
which the function does not use an explicit return statement); it
pushes the remaining computation, the current environment, and the
current control data onto the function stack (the remaining
computation can thus also be discarded from the computation cell,
because an unavoidable subsequent return
statement ‒see
above‒ will always recover it from the stack); it switches the
current environment (which is being pushed on the function stack) to
the global environment, which is where the free variables in the
function body should be looked up; it binds the formal parameters to
fresh locations in the new environment, and stores the actual
arguments to those locations in the store (this latter step is easily
done by reducing the problem to variable declarations, whose semantics
we have already defined; the auxiliary operation mkDecls
is
defined at the end of the file). The second rule pops the
computation, the environment and the control data from the function
stack when a return
statement is encountered as the next
computational task, passing the returned value to the popped
computation (the popped computation was the context in which the
returning function was called). Note that the pushing/popping of the
control data is crucial. Without it, one may have a function that
contains an exception block with a return statement inside, which
would put the xstack
cell in an inconsistent state (since the
exception block modifies it, but that modification should be
irrelevant once the function returns). We add an artificial
nothing
value to the language, which is returned by the
nulary return;
statements.
syntax KItem ::= (Map,K,ControlCellFragment) rule <k> lambda(Xs,S)(Vs:Vals) ~> K => mkDecls(Xs,Vs) S return; </k> <control> <fstack> .List => ListItem((Env,K,C)) ...</fstack> C </control> <env> Env => GEnv </env> <genv> GEnv </genv> rule <k> return(V:Val); ~> _ => V ~> K </k> <control> <fstack> ListItem((Env,K,C)) => .List ...</fstack> (_ => C) </control> <env> _ => Env </env> syntax Val ::= "nothing" rule return; => return nothing;
Like for division-by-zero, it is left unspecified what happens
when the nothing
value is used in domain calculations. For
example, from the the perspective of the language semantics,
7 +Int nothing
can evaluate to anything, or
may not evaluate at all (be undefined). If one wants to make sure that
such artificial values are never misused, then one needs to define a static
checker (also using K, like our the type checker in
examples/simple/typed/static
) and reject programs that do.
Note that, unlike the undefined symbol ⊥
which had the sort K
instead of Val
, we defined nothing
to be a value. That
is because, as explained above, we do not want the program to get
stuck when nothing is returned by a function. Instead, we want the
behavior to be unspecified; in particular, if one is careful to never
use the returned value in domain computation, like it happens when we
call a function for its side effects (e.g., with a statement of the
form f(x);
), then the program does not get stuck.
The read()
expression construct simply evaluates to the next
input value, at the same time discarding the input value from the
in
cell.
rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input>
In SIMPLE, like in C, assignments are expression constructs and not statement
constructs. To make it a statement all one needs to do is to follow it by a
semi-colon ;
(see the semantics for expression statements below).
Like for the increment, we want to allow assignments not only to variables but
also to array elements, e.g., e1[e2] = e3
where e1
evaluates
to an array reference, e2
to a natural number, and e3
to any
value. Thus, we first compute the lvalue of the left-hand-side expression
that appears in an assignment, and then we do the actual assignment to the
resulting location:
context (HOLE => lvalue(HOLE)) = _ rule <k> loc(L) = V:Val => V ...</k> <store>... L |-> (_ => V) ...</store>
We next define the K semantics of statements.
Empty blocks are simply discarded, as shown in the first rule below.
For non-empty blocks, we schedule the enclosed statement but we have to
make sure the environment is recovered after the enclosed statement executes.
Recall that we allow local variable declarations, whose scope is the block
enclosing them. That is the reason for which we have to recover the
environment after the block. This allows us to have a very simple semantics
for variable declarations, as we did above. One can make the two rules below
computational if one wants them to count as computational steps.
rule {} => .K rule <k> { S } => S ~> setEnv(Env) ...</k> <env> Env </env>
The basic definition of environment recovery is straightforward and
given in the section on auxiliary constructs at the end of the file.
There are two common alternatives to the above semantics of blocks.
One is to keep track of the variables which are declared in the block and only
recover those at the end of the block. This way one does more work for
variable declarations but conceptually less work for environment recovery; we
say conceptually
because it is not clear that it is indeed the case that
one does less work when AC matching is involved. The other alternative is to
work with a stack of environments instead of a flat environment, and push the
current environment when entering a block and pop it when exiting it. This
way, one does more work when accessing variables (since one has to search the
variable in the environment stack in a top-down manner), but on the other hand
uses smaller environments and the definition gets closer to an implementation.
Based on experience with dozens of language semantics and other K definitions,
we have found that our approach above is the best trade-off between elegance
and efficiency (especially since rewrite engines have built-in techniques to
lazily copy terms, by need, thus not creating unnecessary copies),
so it is the one that we follow in general.
Sequential composition is desugared into K's builtin sequentialization
operation (recall that, like in C, the semi-colon ;
is not a
statement separator in SIMPLE — it is either a statement terminator or a
construct for a statement from an expression). Note that K allows
to define the semantics of SIMPLE in such a way that statements eventually
dissolve from the top of the computation when they are completed; this is in
sharp contrast to (artificially) evaluating
them to a special
skip
statement value and then getting rid of that special value, as
it is the case in other semantic approaches (where everything must evaluate
to something). This means that once S₁
completes in the rule below, S₂
becomes automatically the next computation item without any additional
(explicit or implicit) rules.
rule S1:Stmt S2:Stmt => S1 ~> S2
A subtle aspect of the rule above is that S₁
is declared to have sort
Stmts
and not Stmt
. That is because desugaring macros can indeed
produce left associative sequential composition of statements. For example,
the code var x=0; x=1;
is desugared to
(var x; x=0;) x=1;
, so although originally the first term of
the sequential composition had sort Stmt
, after desugaring it became
of sort Stmts
. Note that the attribute [right]
associated
to the sequential compositon production is an attribute of the syntax, and not
of the semantics: e.g., it tells the parser to parse
var x; x=0; x=1;
as var x; (x=0; x=1;)
, but it
does not tell the rewrite engine to rewrite (var x; x=0;) x=1;
to
var x; (x=0; x=1;)
.
Expression statements are only used for their side effects, so their result
value is simply discarded. Common examples of expression statements are ones
of the form ++x;
, x=e;
, e1[e2]=e3;
, etc.
rule _:Val; => .K
Since the conditional was declared with the strict(1)
attribute, we
can assume that its first argument will eventually be evaluated. The rules
below cover the only two possibilities in which the conditional is allowed to
proceed (otherwise the rewriting process gets stuck).
rule if ( true) S else _ => S rule if (false) _ else S => S
The simplest way to give the semantics of the while loop is by unrolling.
Note, however, that its unrolling is only allowed when the while loop reaches
the top of the computation (to avoid non-termination of unrolling). The
simple while loop semantics below works because our while loops in SIMPLE are
indeed very basic. If we allowed break/continue of loops then we would need
a completely different semantics, which would also involve the control
cell.
rule while (E) S => if (E) {S while(E)S}
The print
statement was strict, so all its arguments are now
evaluated (recall that print
is variadic). We append each of
its evaluated arguments to the output buffer, and discard the residual
print
statement with an empty list of arguments.
rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output> rule print(.Vals); => .K
SIMPLE allows parametric exceptions, in that one can throw and catch a
particular value. The statement try S₁ catch(X) S₂
proceeds with the evaluation of S₁
. If S₁
evaluates normally, i.e.,
without any exception thrown, then S₂
is discarded and the execution
continues normally. If S₁
throws an exception with a statement of the
form throw E
, then E
is first evaluated to some value V
(throw
was declared to be strict), then V
is bound to X
, then
S₂
is evaluated in the new environment while the reminder of S₁
is
discarded, then the environment is recovered and the execution continues
normally with the statement following the try S₁ catch(X) S₂
statement.
Exceptions can be nested and the statements in the
catch
part (S₂
in our case) can throw exceptions to the
upper level. One should be careful with how one handles the control data
structures here, so that the abrupt changes of control due to exception
throwing and to function returns interact correctly with each other.
For example, we want to allow function calls inside the statement S₁
in
a try S₁ catch(X) S₂
block which can throw an exception
that is not caught by the function but instead is propagated to the
try S₁ catch(X) S₂
block that called the function.
Therefore, we have to make sure that the function stack as well as other
potential control structures are also properly modified when the exception
is thrown to correctly recover the execution context. This can be easily
achieved by pushing/popping the entire current control context onto the
exception stack. The three rules below modularly do precisely the above.
syntax KItem ::= (Id,Stmt,K,Map,ControlCellFragment) syntax KItem ::= "popx" rule <k> (try S1 catch(X) {S2} => S1 ~> popx) ~> K </k> <control> <xstack> .List => ListItem((X, S2, K, Env, C)) ...</xstack> C </control> <env> Env </env> rule <k> popx => .K ...</k> <xstack> ListItem(_) => .List ...</xstack> rule <k> throw V:Val; ~> _ => { var X = V; S2 } ~> K </k> <control> <xstack> ListItem((X, S2, K, Env, C)) => .List ...</xstack> (_ => C) </control> <env> _ => Env </env>
The catch statement S₂
needs to be executed in the original environment,
but where the thrown value V
is bound to the catch variable X
. We here
chose to rely on two previously defined constructs when giving semantics to
the catch part of the statement: (1) the variable declaration with
initialization, for binding X
to V
; and (2) the block construct for
preventing X
from shadowing variables in the original environment upon the
completion of S₂
.
SIMPLE's threads can be created and terminated dynamically, and can
synchronize by acquiring and releasing re-entrant locks and by rendezvous.
We discuss the seven rules giving the semantics of these operations below.
Threads can be created by any other threads using the spawn S
construct. The spawn expression construct evaluates to the unique identifier
of the newly created thread and, at the same time, a new thread cell is added
into the configuration, initialized with the S
statement and sharing the
same environment with the parent thread. Note that the newly created
thread
cell is torn. That means that the remaining cells are added
and initialized automatically as described in the definition of SIMPLE's
configuration. This is part of K's configuration abstraction mechanism.
rule <thread>... <k> spawn S => !T:Int ...</k> <env> Env </env> ...</thread> (.Bag => <thread>... <k> S </k> <env> Env </env> <id> !T </id> ...</thread>)
Dually to the above, when a thread terminates its assigned computation (the
contents of its k
cell) is empty, so the thread can be dissolved.
However, since no discipline is imposed on how locks are acquired and released,
it can be the case that a terminating thread still holds locks. Those locks
must be released, so other threads attempting to acquire them do not deadlock.
We achieve that by removing all the locks held by the terminating thread in its
holds
cell from the set of busy locks in the busy
cell
(keys(H)
returns the domain of the map H
as a set, that is, only
the locks themselves ignoring their multiplicity). As seen below, a lock is
added to the busy
cell as soon as it is acquired for the first time
by a thread. The unique identifier of the terminated thread is also collected
into the terminated
cell, so the join
construct knows which
threads have terminated.
rule (<thread>... <k>.K</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag) <busy> Busy => Busy -Set keys(H) </busy> <terminated>... .Set => SetItem(T) ...</terminated>
Thread joining is now straightforward: all we need to do is to check whether
the identifier of the thread to be joined is in the terminated
cell.
If yes, then the join
statement dissolves and the joining thread
continues normally; if not, then the joining thread gets stuck.
rule <k> join T:Int; => .K ...</k> <terminated>... SetItem(T) ...</terminated>
There are two cases to distinguish when a thread attempts to acquire a lock
(in SIMPLE any value can be used as a lock):
(1) The thread does not currently have the lock, in which case it has to
take it provided that the lock is not already taken by another thread (see
the side condition of the first rule).
(2) The thread already has the lock, in which case it just increments its
counter for the lock (the locks are re-entrant). These two cases are captured
by the two rules below:
rule <k> acquire V:Val; => .K ...</k> <holds>... .Map => V |-> 0 ...</holds> <busy> Busy (.Set => SetItem(V)) </busy> requires (notBool(V in Busy)) rule <k> acquire V; => .K ...</k> <holds>... V:Val |-> (N => N +Int 1) ...</holds>
Similarly, there are two corresponding cases to distinguish when a thread
releases a lock:
(1) The thread holds the lock more than once, in which case all it needs to do
is to decrement the lock counter.
(2) The thread holds the lock only once, in which case it needs to remove it
from its holds
cell and also from the the shared busy
cell,
so other threads can acquire it if they need to.
rule <k> release V:Val; => .K ...</k> <holds>... V |-> (N => N -Int 1) ...</holds> requires N >Int 0 rule <k> release V; => .K ...</k> <holds>... V:Val |-> 0 => .Map ...</holds> <busy>... SetItem(V) => .Set ...</busy>
In addition to synchronization through acquire and release of locks, SIMPLE
also provides a construct for rendezvous synchronization. A thread whose next
statement to execute is rendezvous(V)
gets stuck until another
thread reaches an identical statement; when that happens, the two threads
drop their rendezvous statements and continue their executions. If three
threads happen to have an identical rendezvous statement as their next
statement, then precisely two of them will synchronize and the other will
remain blocked until another thread reaches a similar rendezvous statement.
The rule below is as simple as it can be. Note, however, that, again, it is
K's mechanism for configuration abstraction that makes it work as desired:
since the only cell which can multiply containing a k
cell inside is
the thread
cell, the only way to concretize the rule below to the
actual configuration of SIMPLE is to include each k
cell in a
thread
cell.
rule <k> rendezvous V:Val; => .K ...</k> <k> rendezvous V; => .K ...</k>
In this section we define all the auxiliary constructs used in the
above semantics.
The mkDecls
auxiliary construct turns a list of identifiers
and a list of values in a sequence of corresponding variable
declarations.
syntax Stmt ::= mkDecls(Ids,Vals) [function] rule mkDecls((X:Id, Xs:Ids), (V:Val, Vs:Vals)) => var X=V; mkDecls(Xs,Vs) rule mkDecls(.Ids,.Vals) => {}
The operation below is straightforward.
syntax Exp ::= lookup(Int) rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store>
We have already discussed the environment recovery auxiliary operation in the
IMP++ tutorial:
// TODO: eliminate the env wrapper, like we did in IMP++ syntax KItem ::= setEnv(Map) rule <k> setEnv(Env) => .K ...</k> <env> _ => Env </env>
While theoretically sufficient, the basic definition for environment
recovery alone is suboptimal. Consider a loop while (E)S
,
whose semantics (see above) was given by unrolling. S
is a block. Then the semantics of blocks above, together with the
unrolling semantics of the while loop, will yield a computation
structure in the k
cell that increasingly grows, adding a new
environment recovery task right in front of the already existing sequence of
similar environment recovery tasks (this phenomenon is similar to the ``tail
recursion'' problem). Of course, when we have a sequence of environment
recovery tasks, we only need to keep the last one. The elegant rule below
does precisely that, thus avoiding the unnecessary computation explosion
problem:
rule (setEnv(_) => .K) ~> setEnv(_)
In fact, the above follows a common convention in K for recovery
operations of cell contents: the meaning of a computation task of the form
cell(C)
that reaches the top of the computation is that the current
contents of cell cell
is discarded and gets replaced with C
. We
did not add support for these special computation tasks in our current
implementation of K, so we need to define them as above.
For convenience in giving the semantics of constructs like the increment and
the assignment, that we want to operate the same way on variables and on
array elements, we used an auxiliary lvalue(E)
construct which was
expected to evaluate to the lvalue of the expression E
. This is only
defined when E
has an lvalue, that is, when E
is either a variable or
evaluates to an array element. lvalue(E)
evaluates to a value of
the form loc(L)
, where L
is the location where the value of E
can be found; for clarity, we use loc
to structurally distinguish
natural numbers from location values. In giving semantics to lvalue
there are two cases to consider. (1) If E
is a variable, then all we need
to do is to grab its location from the environment. (2) If E
is an array
element, then we first evaluate the array and its index in order to identify
the exact location of the element of concern, and then return that location;
the last rule below works because its preceding context declarations ensure
that the array and its index are evaluated, and then the rule for array lookup
(defined above) rewrites the evaluated array access construct to its
corresponding store lookup operation.
// For parsing reasons, we prefer to allow lvalue to take a K syntax Exp ::= lvalue(K) syntax Val ::= loc(Int) // Local variable rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env> // Array element: evaluate the array and its index; // then the array lookup rule above applies. context lvalue(_::Exp[HOLE::Exps]) context lvalue(HOLE::Exp[_::Exps]) // Finally, return the address of the desired object member rule lvalue(lookup(L:Int) => loc(L))
The following operation initializes a sequence of locations with the same
value:
syntax Map ::= Int "..." Int "|->" K [function] rule N...M |-> _ => .Map requires N >Int M rule N...M |-> K => N |-> K (N +Int 1)...M |-> K requires N <=Int M
The semantics of SIMPLE is now complete. Make sure you kompile the
definition with the right options in order to generate the desired model.
No kompile options are needed if you only only want to execute the definition
(and thus get an interpreter), but if you want to search for a different
program behaviors then you need to kompile with the --enable-search option
endmodule
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of the untyped SIMPLE language.
SIMPLE is intended to be a pedagogical and research language that captures
the essence of the imperative programming paradigm, extended with several
features often encountered in imperative programming languages.
A program consists of a set of global variable declarations and
function definitions. Like in C, function definitions cannot be
nested and each program must have one function called main
,
which is invoked when the program is executed. To make it more
interesting and to highlight some of K's strengths, SIMPLE includes
the following features in addition to the conventional imperative
expression and statement constructs:
Multidimensional arrays and array references. An array evaluates
to an array reference, which is a special value holding a location (where
the elements of the array start) together with the size of the array;
the elements of the array can be array references themselves (particularly
when the array is multi-dimensional). Array references are ordinary values,
so they can be assigned to variables and passed/received by functions.
Functions and function values. Functions can have zero or
more parameters and can return abruptly using a return
statement.
SIMPLE follows a call-by-value parameter passing style, with static scoping.
Function names evaluate to function abstractions, which hereby become ordinary
values in the language, same like the array references.
Blocks with locals. SIMPLE variables can be declared
anywhere, their scope being from the place where they are declared
until the end of the most nested enclosing block.
Input/Output. The expression read()
evaluates to the
next value in the input buffer, and the statement write(e)
evaluates e
and outputs its value to the output buffer. The
input and output buffers are lists of values.
Exceptions. SIMPLE has parametric exceptions (the value thrown as
an exception can be caught and bound).
Concurrency via dynamic thread creation/termination and
synchronization. One can spawn a thread to execute any statement.
The spawned thread shares with its parent its environment at creation time.
Threads can be synchronized via a join command which blocks the current thread
until the joined thread completes, via re-entrant locks which can be acquired
and released, as well as through rendezvous commands.
Like in many other languages, some of SIMPLE's constructs can be
desugared into a smaller set of basic constructs. We do that at the end
of the syntax module, and then we only give semantics to the core constructs.
Note: This definition is commented slightly more than others, because it is
intended to be one of the first non-trivial definitions that the new
user of K sees. We recommend the beginner user to first check the
language definitions discussed in the K tutorial.
module SIMPLE-UNTYPED-SYNTAX imports DOMAINS-SYNTAX
We start by defining the SIMPLE syntax. The language constructs discussed
above have the expected syntax and evaluation strategies. Recall that in K
we annotate the syntax with appropriate strictness attributes, thus giving
each language construct the desired evaluation strategy.
Recall from the K tutorial that identifiers are builtin and come under the
syntactic category Id
. The special identifier for the function
main
belongs to all programs, and plays a special role in the semantics,
so we declare it explicitly. This would not be necessary if the identifiers
were all included automatically in semantic definitions, but that is not
possible because of parsing reasons (e.g., K variables used to match
concrete identifiers would then be ambiguously parsed as identifiers). They
are only included in the parser generated to parse programs (and used by the
kast
tool). Consequently, we have to explicitly declare all the
concrete identifiers that play a special role in the semantics, like
main
below.
syntax Id ::= "main" [token]
There are two types of declarations: for variables (including arrays) and
for functions. We are going to allow declarations of the form
var x=10, a[10,10], y=23;
, which is why we allow the var
keyword to take a list of expressions. The non-terminals used in the two
productions below are defined shortly.
syntax Stmt ::= "var" Exps ";" | "function" Id "(" Ids ")" Block
The expression constructs below are standard. Increment (++
) takes
an expression rather than a variable because it can also increment an array
element. Recall that the syntax we define in K is what we call the syntax
of the semantics: while powerful enough to define non-trivial syntaxes
(thanks to the underlying SDF technology that we use), we typically refrain
from defining precise syntaxes, that is, ones which accept precisely the
well-formed programs (that would not be possible anyway in general). That job
is deferred to type systems, which can also be defined in K. In other words,
we are not making any effort to guarantee syntactically that only variables
or array elements are passed to the increment construct, we allow any
expression. Nevertheless, we will only give semantics to those, so expressions
of the form ++5
, which parse (but which will be rejected by our type
system in the typed version of SIMPLE later), will get stuck when executed.
Arrays can be multidimensional and can hold other arrays, so their
lookup operation takes a list of expressions as argument and applies to an
expression (which can in particular be another array lookup), respectively.
The construct sizeOf
gives the size of an array in number of elements
of its first dimension. Note that almost all constructs are strict. The only
constructs which are not strict are the increment (since its first argument
gets updated, so it cannot be evaluated), the input read which takes no
arguments so strictness is irrelevant for it, the logical and and or constructs
which are short-circuited, the thread spawning construct which creates a new
thread executing the argument expression and return its unique identifier to
the creating thread (so it cannot just evaluate its argument in place), and the
assignment which is only strict in its second argument (for the same reason as
the increment).
syntax Exp ::= Int | Bool | String | Id | "(" Exp ")" [bracket] | "++" Exp > Exp "[" Exps "]" [strict] > Exp "(" Exps ")" [strict] | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict(1), left] | Exp "||" Exp [strict(1), left] > "spawn" Block > Exp "=" Exp [strict(2), right]
We also need comma-separated lists of identifiers and of expressions.
Moreover, we want them to be strict, that is, to evaluate to lists of results
whenever requested (e.g., when they appear as strict arguments of
the constructs above).
syntax Ids ::= List{Id,","} [overload(Exps)] syntax Exps ::= List{Exp,","} [overload(Exps), strict] // automatically hybrid now syntax Exps ::= Ids syntax Val syntax Vals ::= List{Val,","} [overload(Exps)] syntax Bottom syntax Bottoms ::= List{Bottom,","} [overload(Exps)] syntax Ids ::= Bottoms
Most of the statement constructs are standard for imperative languages.
We syntactically distinguish between empty and non-empty blocks, because we
chose Stmts
not to be a (;
-separated) list of
Stmt
. Variables can be declared anywhere inside a block, their scope
ending with the block. Expressions are allowed to be used for their side
effects only (followed by a semicolon ;
). Functions are allowed
to abruptly return. The exceptions are parametric, i.e., one can throw a value
which is bound to the variable declared by catch
. Threads can be
dynamically created and terminated, and can synchronize with join
,
acquire
, release
and rendezvous
. Note that the
strictness attributes obey the intended evaluation strategy of the various
constructs. In particular, the if-then-else construct is strict only in its
first argument (the if-then construct will be desugared into if-then-else),
while the loop constructs are not strict in any arguments. The print
statement construct is variadic, that is, it takes an arbitrary number of
arguments.
syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict(1)] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "return" Exp ";" [strict] | "return" ";" [macro] | "print" "(" Exps ")" ";" [strict] // NOTE: print strict allows non-deterministic evaluation of its arguments // Either keep like this but document, or otherwise make Exps seqstrict. // Of define and use a different expression list here, which is seqstrict. | "try" Block "catch" "(" Id ")" Block | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict]
The reason we allow Stmts
as the first argument of for
instead of Stmt
is because we want to allow more than one statement
to be executed when the loop is initialized. Also, as seens shorly, macros
may expand one statement into more statements; for example, an initialized
variable declaration statement var x=0;
desugars into two statements,
namely var x; x=0;
, so if we use Stmt
instead of Stmts
in the production of for
above then we risk that the macro expansion
of statement var x=0;
happens before the macro expansion of for
,
also shown below, in which case the latter would not apply anymore because
of syntactic mismatch.
syntax Stmt ::= Stmt Stmt [right] // I wish I were able to write the following instead, but confuses the parser. // // syntax Stmts ::= List{Stmt,""} // syntax Top ::= Stmt | "function" Id "(" Ids ")" Block // syntax Pgm ::= List{Top,""} // // With that, I could have also eliminated the empty block
This part desugars some of SIMPLE's language constructs into core ones.
We only want to give semantics to core constructs, so we get rid of the
derived ones before we start the semantics. All desugaring macros below are
straightforward.
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S} => {Start while (Cond) {S Step;}} rule for(Start Cond; Step) {} => {Start while (Cond) {Step;}} rule var E1:Exp, E2:Exp, Es:Exps; => var E1; var E2, Es; rule var X:Id = E; => var X; X = E;
For the semantics, we can therefore assume from now on that each
conditional has both branches, that there are only while
loops, and
that each variable is declared alone and without any initialization as part of
the declaration.
endmodule module SIMPLE-UNTYPED imports SIMPLE-UNTYPED-SYNTAX imports DOMAINS
Before one starts adding semantic rules to a K definition, one needs to
define the basic semantic infrastructure consisting of definitions for
values
and configuration
. As discussed in the definitions
in the K tutorial, the values are needed to know when to stop applying
the heating rules and when to start applying the cooling rules corresponding
to strictness or context declarations. The configuration serves as a backbone
for the process of configuration abstraction which allows users to only
mention the relevant cells in each semantic rule, the rest of the configuration
context being inferred automatically. Although in some cases the configuration
could be automatically inferred from the rules, we believe that it is very
useful for language designers/semanticists to actually think of and design
their configuration explicitly, so the current implementation of K requires
one to define it.
We here define the values of the language that the various fragments of
programs evaluate to. First, integers and Booleans are values. As discussed,
arrays evaluate to special array reference values holding (1) a location from
where the array's elements are contiguously allocated in the store, and
(2) the size of the array. Functions evaluate to function values as
λ-abstractions (we do not need to evaluate functions to closures
because each function is executed in the fixed global environment and
function definitions cannot be nested). Like in IMP and other
languages, we finally tell the tool that values are K results.
syntax Val ::= Int | Bool | String | array(Int,Int) | lambda(Ids,Stmt) syntax Exp ::= Val syntax Exps ::= Vals syntax Vals ::= Bottoms syntax KResult ::= Val | Vals // TODO: should not need this
The inclusion of values in expressions follows the methodology of
syntactic definitions (like, e.g., in SOS): extend the syntax of the language
to encompass all values and additional constructs needed to give semantics.
In addition to that, it allows us to write the semantic rules using the
original syntax of the language, and to parse them with the same (now extended
with additional values) parser. If writing the semantics directly on the K
AST, using the associated labels instead of the syntactic constructs, then one
would not need to include values in expressions.
The K configuration of SIMPLE consists of a top level cell, T
,
holding a threads
cell, a global environment map cell genv
mapping the global variables and function names to their locations, a shared
store map cell store
mapping each location to some value, a set cell
busy
holding the locks which have been acquired but not yet released
by threads, a set cell terminated
holding the unique identifiers of
the threads which already terminated (needed for join
), input
and output
list cells, and a nextLoc
cell holding a natural
number indicating the next available location. Unlike in the small languages
in the K tutorial, where we used the fresh predicate to generate fresh
locations, in larger languages, like SIMPLE, we prefer to explicitly manage
memory. The location counter in nextLoc
models an actual physical
location in the store; for simplicity, we assume arbitrarily large memory and
no garbage collection. The threads
cell contains one thread
cell for each existing thread in the program. Note that the thread cell has
multiplicity *
, which means that at any given moment there could be zero,
one or more thread
cells. Each thread
cell contains a
computation cell k
, a control
cell holding the various
control structures needed to jump to certain points of interest in the program
execution, a local environment map cell env
mapping the thread local
variables to locations in the store, and finally a holds
map cell
indicating what locks have been acquired by the thread and not released so far
and how many times (SIMPLE's locks are re-entrant). The control
cell
currently contains only two subcells, a function stack fstack
which
is a list and an exception stack xstack
which is also a list.
One can add more control structures in the control
cell, such as a
stack for break/continue of loops, etc., if the language is extended with more
control-changing constructs. Note that all cells except for k
are
also initialized, in that they contain a ground term of their corresponding
sort. The k
cell is initialized with the program that will be passed
to the K tool, as indicated by the $PGM
variable, followed by the
execute
task (defined shortly).
// the syntax declarations below are required because the sorts are // referenced directly by a production and, because of the way KIL to KORE // is implemented, the configuration syntax is not available yet // should simply work once KIL is removed completely // check other definitions for this hack as well syntax ControlCell syntax ControlCellFragment configuration <T color="red"> <threads color="orange"> <thread multiplicity="*" type="Map" color="yellow"> <id color="pink"> -1 </id> <k color="green"> $PGM:Stmt ~> execute </k> //<br/> // TODO(KORE): support latex annotations #1799 <control color="cyan"> <fstack color="blue"> .List </fstack> <xstack color="purple"> .List </xstack> </control> //<br/> // TODO(KORE): support latex annotations #1799 <env color="violet"> .Map </env> <holds color="black"> .Map </holds> </thread> </threads> //<br/> // TODO(KORE): support latex annotations #1799 <genv color="pink"> .Map </genv> <store color="white"> .Map </store> <busy color="cyan"> .Set </busy> <terminated color="red"> .Set </terminated> //<br/> // TODO(KORE): support latex annotations #1799 <input color="magenta" stream="stdin"> .List </input> <output color="brown" stream="stdout"> .List </output> <nextLoc color="gray"> 0 </nextLoc> </T>
We start by defining the semantics of declarations (for variables,
arrays and functions).
The SIMPLE syntax was desugared above so that each variable is
declared alone and its initialization is done as a separate statement.
The semantic rule below matches resulting variable declarations of the
form var X;
on top of the k
cell
(indeed, note that the k
cell is complete, or round, to the
left, and is torn, or ruptured, to the right), allocates a fresh
location L
in the store which is initialized with a special value
⊥
(indeed, the unit .
, or nothing, is matched anywhere
in the map ‒note the tears at both sides‒ and replaced with the
mapping L ↦ ⊥
), and binds X
to L
in the local
environment shadowing previous declarations of X
, if any.
This possible shadowing of X
requires us to therefore update the
entire environment map, which is expensive and can significantly slow
down the execution of larger programs. On the other hand, since we know
that L
is not already bound in the store, we simply add the binding
L ↦ ⊥
to the store, thus avoiding a potentially complete
traversal of the the store map in order to update it. We prefer the approach
used for updating the store whenever possible, because, in addition to being
faster, it offers more true concurrency than the latter; indeed, according
to the concurrent semantics of K
, the store is not frozen while
L ↦ ⊥
is added to it, while the environment is frozen during the
update operation Env[L/X]
. The variable declaration command is
also removed from the top of the computation cell and the fresh location
counter is incremented. The undefined symbol ⊥
added in the store
is of sort KItem
, instead of Val
, on purpose; this way, the
store lookup rules will get stuck when one attempts to lookup an
uninitialized location. All the above happen in one transactional step,
with the rule below. Note also how configuration abstraction allows us to
only mention the needed cells; indeed, as the configuration above states,
the k
and env
cells are actually located within a
thread
cell within the threads
cell, but one needs
not mention these: the configuration context of the rule is
automatically transformed to match the declared configuration
structure.
syntax KItem ::= "undefined" rule <k> var X:Id; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> undefined ...</store> <nextLoc> L => L +Int 1 </nextLoc>
The K semantics of the uni-dimensional array declaration is somehow similar
to the above declaration of ordinary variables. First, note the
context declaration below, which requests the evaluation of the array
dimension. Once evaluated, say to a natural number N
, then
N +Int 1
locations are allocated in the store for
an array of size N
, the additional location (chosen to be the first
one allocated) holding the array reference value. The array reference
value array(L,N)
states that the array has size N
and its
elements are located contiguously in the store starting with location
L
. The operation L … L' ↦ V
, defined at the end of this
file in the auxiliary operation section, initializes each location in
the list L … L'
to V
. Note that, since the dimensions of
array declarations can be arbitrary expressions, this virtually means
that we can dynamically allocate memory in SIMPLE by means of array
declarations.
context var _:Id[HOLE]; rule <k> var X:Id[N:Int]; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> array(L +Int 1, N) (L +Int 1) ... (L +Int N) |-> undefined ...</store> <nextLoc> L => L +Int 1 +Int N </nextLoc> requires N >=Int 0
SIMPLE allows multi-dimensional arrays. For semantic simplicity, we
desugar them all into uni-dimensional arrays by code transformation.
This way, we only need to give semantics to uni-dimensional arrays.
First, note that the context rule above actually evaluates all the array
dimensions (that's why we defined the expression lists strict!):
Upon evaluating the array dimensions, the code generation rule below
desugars multi-dimensional array declaration to uni-dimensional declarations.
To this aim, we introduce two special unique variable identifiers,
$1
and $2
. The first variable, $1
, iterates
through and initializes each element of the first dimension with an array
of the remaining dimensions, declared as variable $2
:
syntax Id ::= "$1" [token] | "$2" [token] rule var X:Id[N1:Int, N2:Int, Vs:Vals]; => var X[N1]; { for(var $1 = 0; $1 <= N1 - 1; ++$1) { var $2[N2, Vs]; X[$1] = $2; } }
Ideally, one would like to perform syntactic desugarings like the one
above before the actual semantics. Unfortunately, that was not possible in
this case because the dimension expressions of the multi-dimensional array need
to be evaluated first. Indeed, the desugaring rule above does not work if the
dimensions of the declared array are arbitrary expressions, because they can
have side effects (e.g., a[++x,++x]
) and those side effects would be
propagated each time the expression is evaluated in the desugaring code (note
that both the loop condition and the nested multi-dimensional declaration
would need to evaluate the expressions given as array dimensions).
Functions are evaluated to λ-abstractions and stored like any other
values in the store. A binding is added into the environment for the function
name to the location holding its body. Similarly to the C language, SIMPLE
only allows function declarations at the top level of the program. More
precisely, the subsequent semantics of SIMPLE only works well when one
respects this requirement. Indeed, the simplistic context-free parser
generated by the grammar above is more generous than we may want, in that it
allows function declarations anywhere any declaration is allowed, including
inside arbitrary blocks. However, as the rule below shows, we are not
storing the declaration environment with the λ-abstraction value as
closures do. Instead, as seen shortly, we switch to the global environment
whenever functions are invoked, which is consistent with our requirement that
functions should only be declared at the top. Thus, if one declares local
functions, then one may see unexpected behaviors (e.g., when one shadows a
global variable before declaring a local function). The type checker of
SIMPLE, also defined in K (see examples/simple/typed/static
),
discards programs which do not respect this requirement.
rule <k> function F(Xs) S => .K ...</k> <env> Env => Env[F <- L] </env> <store>... .Map => L |-> lambda(Xs, S) ...</store> <nextLoc> L => L +Int 1 </nextLoc>
When we are done with the first pass (pre-processing), the computation
cell k
contains only the token execute
(see the configuration
declaration above, where the computation item execute
was placed
right after the program in the k
cell of the initial configuration)
and the cell genv
is empty. In this case, we have to call
main()
and to initialize the global environment by transferring the
contents of the local environment into it. We prefer to do it this way, as
opposed to processing all the top level declarations directly within the global
environment, because we want to avoid duplication of semantics: the syntax of
the global declarations is identical to that of their corresponding local
declarations, so the semantics of the latter suffices provided that we copy
the local environment into the global one once we are done with the
pre-processing. We want this separate pre-processing step precisely because
we want to create the global environment. All (top-level) functions end up
having their names bound in the global environment and, as seen below, they
are executed in that same global environment; all these mean, in particular,
that the functions "see" each other, allowing for mutual recursion, etc.
syntax KItem ::= "execute" rule <k> execute => main(.Exps); </k> <env> Env </env> <genv> .Map => Env </genv>
We next define the K semantics of all the expression constructs.
When a variable X
is the first computational task, and X
is bound to some
location L
in the environment, and L
is mapped to some value V
in the
store, then we rewrite X
into V
:
rule <k> X:Id => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store>
Note that the rule above excludes reading ⊥
, because ⊥
is not
a value and V
is checked at runtime to be a value.
This is tricky, because we want to allow both ++x
and ++a[5]
.
Therefore, we need to extract the lvalue of the expression to increment.
To do that, we state that the expression to increment should be wrapped
by the auxiliary lvalue
operation and then evaluated. The semantics
of this auxiliary operation is defined at the end of this file. For now, all
we need to know is that it takes an expression and evaluates to a location
value. Location values, also defined at the end of the file, are integers
wrapped with the operation loc
, to distinguish them from ordinary
integers.
context ++(HOLE => lvalue(HOLE)) rule <k> ++loc(L) => I +Int 1 ...</k> <store>... L |-> (I => I +Int 1) ...</store>
There is nothing special about the following rules. They rewrite the
language constructs to their library counterparts when their arguments
become values of expected sorts:
rule I1 + I2 => I1 +Int I2 rule Str1 + Str2 => Str1 +String Str2 rule I1 - I2 => I1 -Int I2 rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2
The equality and inequality constructs reduce to syntactic comparison
of the two argument values (which is what the equality on K
terms does).
rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2
The logical negation is clear, but the logical conjunction and disjunction
are short-circuited:
rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E
Untyped SIMPLE does not check array bounds (the dynamically typed version of
it, in examples/simple/typed/dynamic
, does check for array out of
bounds). The first rule below desugars the multi-dimensional array access to
uni-dimensional array access; recall that the array access operation was
declared strict, so all sub-expressions involved are already values at this
stage. The second rule rewrites the array access to a lookup operation at a
precise location; we prefer to do it this way to avoid locking the store.
The semantics of the auxiliary lookup
operation is straightforward,
and is defined at the end of the file.
// The [anywhere] feature is underused, because it would only be used // at the top of the computation or inside the lvalue wrapper. So it // may not be worth, or we may need to come up with a special notation // allowing us to enumerate contexts for [anywhere] rules. rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs] [anywhere] rule array(L,_)[N:Int] => lookup(L +Int N) [anywhere]
The size of the array is stored in the array reference value, and the
sizeOf
construct was declared strict, so:
rule sizeOf(array(_,N)) => N
Function application was strict in both its arguments, so we can
assume that both the function and its arguments are evaluated to
values (the former expected to be a λ-abstraction). The first
rule below matches a well-formed function application on top of the
computation and performs the following steps atomically: it switches
to the function body followed by return;
(for the case in
which the function does not use an explicit return statement); it
pushes the remaining computation, the current environment, and the
current control data onto the function stack (the remaining
computation can thus also be discarded from the computation cell,
because an unavoidable subsequent return
statement ‒see
above‒ will always recover it from the stack); it switches the
current environment (which is being pushed on the function stack) to
the global environment, which is where the free variables in the
function body should be looked up; it binds the formal parameters to
fresh locations in the new environment, and stores the actual
arguments to those locations in the store (this latter step is easily
done by reducing the problem to variable declarations, whose semantics
we have already defined; the auxiliary operation mkDecls
is
defined at the end of the file). The second rule pops the
computation, the environment and the control data from the function
stack when a return
statement is encountered as the next
computational task, passing the returned value to the popped
computation (the popped computation was the context in which the
returning function was called). Note that the pushing/popping of the
control data is crucial. Without it, one may have a function that
contains an exception block with a return statement inside, which
would put the xstack
cell in an inconsistent state (since the
exception block modifies it, but that modification should be
irrelevant once the function returns). We add an artificial
nothing
value to the language, which is returned by the
nulary return;
statements.
syntax KItem ::= (Map,K,ControlCellFragment) rule <k> lambda(Xs,S)(Vs:Vals) ~> K => mkDecls(Xs,Vs) S return; </k> <control> <fstack> .List => ListItem((Env,K,C)) ...</fstack> C </control> <env> Env => GEnv </env> <genv> GEnv </genv> rule <k> return(V:Val); ~> _ => V ~> K </k> <control> <fstack> ListItem((Env,K,C)) => .List ...</fstack> (_ => C) </control> <env> _ => Env </env> syntax Val ::= "nothing" rule return; => return nothing;
Like for division-by-zero, it is left unspecified what happens
when the nothing
value is used in domain calculations. For
example, from the the perspective of the language semantics,
7 +Int nothing
can evaluate to anything, or
may not evaluate at all (be undefined). If one wants to make sure that
such artificial values are never misused, then one needs to define a static
checker (also using K, like our the type checker in
examples/simple/typed/static
) and reject programs that do.
Note that, unlike the undefined symbol ⊥
which had the sort K
instead of Val
, we defined nothing
to be a value. That
is because, as explained above, we do not want the program to get
stuck when nothing is returned by a function. Instead, we want the
behavior to be unspecified; in particular, if one is careful to never
use the returned value in domain computation, like it happens when we
call a function for its side effects (e.g., with a statement of the
form f(x);
), then the program does not get stuck.
The read()
expression construct simply evaluates to the next
input value, at the same time discarding the input value from the
in
cell.
rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input>
In SIMPLE, like in C, assignments are expression constructs and not statement
constructs. To make it a statement all one needs to do is to follow it by a
semi-colon ;
(see the semantics for expression statements below).
Like for the increment, we want to allow assignments not only to variables but
also to array elements, e.g., e1[e2] = e3
where e1
evaluates
to an array reference, e2
to a natural number, and e3
to any
value. Thus, we first compute the lvalue of the left-hand-side expression
that appears in an assignment, and then we do the actual assignment to the
resulting location:
context (HOLE => lvalue(HOLE)) = _ rule <k> loc(L) = V:Val => V ...</k> <store>... L |-> (_ => V) ...</store>
We next define the K semantics of statements.
Empty blocks are simply discarded, as shown in the first rule below.
For non-empty blocks, we schedule the enclosed statement but we have to
make sure the environment is recovered after the enclosed statement executes.
Recall that we allow local variable declarations, whose scope is the block
enclosing them. That is the reason for which we have to recover the
environment after the block. This allows us to have a very simple semantics
for variable declarations, as we did above. One can make the two rules below
computational if one wants them to count as computational steps.
rule {} => .K rule <k> { S } => S ~> setEnv(Env) ...</k> <env> Env </env>
The basic definition of environment recovery is straightforward and
given in the section on auxiliary constructs at the end of the file.
There are two common alternatives to the above semantics of blocks.
One is to keep track of the variables which are declared in the block and only
recover those at the end of the block. This way one does more work for
variable declarations but conceptually less work for environment recovery; we
say conceptually
because it is not clear that it is indeed the case that
one does less work when AC matching is involved. The other alternative is to
work with a stack of environments instead of a flat environment, and push the
current environment when entering a block and pop it when exiting it. This
way, one does more work when accessing variables (since one has to search the
variable in the environment stack in a top-down manner), but on the other hand
uses smaller environments and the definition gets closer to an implementation.
Based on experience with dozens of language semantics and other K definitions,
we have found that our approach above is the best trade-off between elegance
and efficiency (especially since rewrite engines have built-in techniques to
lazily copy terms, by need, thus not creating unnecessary copies),
so it is the one that we follow in general.
Sequential composition is desugared into K's builtin sequentialization
operation (recall that, like in C, the semi-colon ;
is not a
statement separator in SIMPLE — it is either a statement terminator or a
construct for a statement from an expression). Note that K allows
to define the semantics of SIMPLE in such a way that statements eventually
dissolve from the top of the computation when they are completed; this is in
sharp contrast to (artificially) evaluating
them to a special
skip
statement value and then getting rid of that special value, as
it is the case in other semantic approaches (where everything must evaluate
to something). This means that once S₁
completes in the rule below, S₂
becomes automatically the next computation item without any additional
(explicit or implicit) rules.
rule S1:Stmt S2:Stmt => S1 ~> S2
A subtle aspect of the rule above is that S₁
is declared to have sort
Stmts
and not Stmt
. That is because desugaring macros can indeed
produce left associative sequential composition of statements. For example,
the code var x=0; x=1;
is desugared to
(var x; x=0;) x=1;
, so although originally the first term of
the sequential composition had sort Stmt
, after desugaring it became
of sort Stmts
. Note that the attribute [right]
associated
to the sequential compositon production is an attribute of the syntax, and not
of the semantics: e.g., it tells the parser to parse
var x; x=0; x=1;
as var x; (x=0; x=1;)
, but it
does not tell the rewrite engine to rewrite (var x; x=0;) x=1;
to
var x; (x=0; x=1;)
.
Expression statements are only used for their side effects, so their result
value is simply discarded. Common examples of expression statements are ones
of the form ++x;
, x=e;
, e1[e2]=e3;
, etc.
rule _:Val; => .K
Since the conditional was declared with the strict(1)
attribute, we
can assume that its first argument will eventually be evaluated. The rules
below cover the only two possibilities in which the conditional is allowed to
proceed (otherwise the rewriting process gets stuck).
rule if ( true) S else _ => S rule if (false) _ else S => S
The simplest way to give the semantics of the while loop is by unrolling.
Note, however, that its unrolling is only allowed when the while loop reaches
the top of the computation (to avoid non-termination of unrolling). The
simple while loop semantics below works because our while loops in SIMPLE are
indeed very basic. If we allowed break/continue of loops then we would need
a completely different semantics, which would also involve the control
cell.
rule while (E) S => if (E) {S while(E)S}
The print
statement was strict, so all its arguments are now
evaluated (recall that print
is variadic). We append each of
its evaluated arguments to the output buffer, and discard the residual
print
statement with an empty list of arguments.
rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output> rule print(.Vals); => .K
SIMPLE allows parametric exceptions, in that one can throw and catch a
particular value. The statement try S₁ catch(X) S₂
proceeds with the evaluation of S₁
. If S₁
evaluates normally, i.e.,
without any exception thrown, then S₂
is discarded and the execution
continues normally. If S₁
throws an exception with a statement of the
form throw E
, then E
is first evaluated to some value V
(throw
was declared to be strict), then V
is bound to X
, then
S₂
is evaluated in the new environment while the reminder of S₁
is
discarded, then the environment is recovered and the execution continues
normally with the statement following the try S₁ catch(X) S₂
statement.
Exceptions can be nested and the statements in the
catch
part (S₂
in our case) can throw exceptions to the
upper level. One should be careful with how one handles the control data
structures here, so that the abrupt changes of control due to exception
throwing and to function returns interact correctly with each other.
For example, we want to allow function calls inside the statement S₁
in
a try S₁ catch(X) S₂
block which can throw an exception
that is not caught by the function but instead is propagated to the
try S₁ catch(X) S₂
block that called the function.
Therefore, we have to make sure that the function stack as well as other
potential control structures are also properly modified when the exception
is thrown to correctly recover the execution context. This can be easily
achieved by pushing/popping the entire current control context onto the
exception stack. The three rules below modularly do precisely the above.
syntax KItem ::= (Id,Stmt,K,Map,ControlCellFragment) syntax KItem ::= "popx" rule <k> (try S1 catch(X) {S2} => S1 ~> popx) ~> K </k> <control> <xstack> .List => ListItem((X, S2, K, Env, C)) ...</xstack> C </control> <env> Env </env> rule <k> popx => .K ...</k> <xstack> ListItem(_) => .List ...</xstack> rule <k> throw V:Val; ~> _ => { var X = V; S2 } ~> K </k> <control> <xstack> ListItem((X, S2, K, Env, C)) => .List ...</xstack> (_ => C) </control> <env> _ => Env </env>
The catch statement S₂
needs to be executed in the original environment,
but where the thrown value V
is bound to the catch variable X
. We here
chose to rely on two previously defined constructs when giving semantics to
the catch part of the statement: (1) the variable declaration with
initialization, for binding X
to V
; and (2) the block construct for
preventing X
from shadowing variables in the original environment upon the
completion of S₂
.
SIMPLE's threads can be created and terminated dynamically, and can
synchronize by acquiring and releasing re-entrant locks and by rendezvous.
We discuss the seven rules giving the semantics of these operations below.
Threads can be created by any other threads using the spawn S
construct. The spawn expression construct evaluates to the unique identifier
of the newly created thread and, at the same time, a new thread cell is added
into the configuration, initialized with the S
statement and sharing the
same environment with the parent thread. Note that the newly created
thread
cell is torn. That means that the remaining cells are added
and initialized automatically as described in the definition of SIMPLE's
configuration. This is part of K's configuration abstraction mechanism.
rule <thread>... <k> spawn S => !T:Int ...</k> <env> Env </env> ...</thread> (.Bag => <thread>... <k> S </k> <env> Env </env> <id> !T </id> ...</thread>)
Dually to the above, when a thread terminates its assigned computation (the
contents of its k
cell) is empty, so the thread can be dissolved.
However, since no discipline is imposed on how locks are acquired and released,
it can be the case that a terminating thread still holds locks. Those locks
must be released, so other threads attempting to acquire them do not deadlock.
We achieve that by removing all the locks held by the terminating thread in its
holds
cell from the set of busy locks in the busy
cell
(keys(H)
returns the domain of the map H
as a set, that is, only
the locks themselves ignoring their multiplicity). As seen below, a lock is
added to the busy
cell as soon as it is acquired for the first time
by a thread. The unique identifier of the terminated thread is also collected
into the terminated
cell, so the join
construct knows which
threads have terminated.
rule (<thread>... <k>.K</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag) <busy> Busy => Busy -Set keys(H) </busy> <terminated>... .Set => SetItem(T) ...</terminated>
Thread joining is now straightforward: all we need to do is to check whether
the identifier of the thread to be joined is in the terminated
cell.
If yes, then the join
statement dissolves and the joining thread
continues normally; if not, then the joining thread gets stuck.
rule <k> join T:Int; => .K ...</k> <terminated>... SetItem(T) ...</terminated>
There are two cases to distinguish when a thread attempts to acquire a lock
(in SIMPLE any value can be used as a lock):
(1) The thread does not currently have the lock, in which case it has to
take it provided that the lock is not already taken by another thread (see
the side condition of the first rule).
(2) The thread already has the lock, in which case it just increments its
counter for the lock (the locks are re-entrant). These two cases are captured
by the two rules below:
rule <k> acquire V:Val; => .K ...</k> <holds>... .Map => V |-> 0 ...</holds> <busy> Busy (.Set => SetItem(V)) </busy> requires (notBool(V in Busy)) rule <k> acquire V; => .K ...</k> <holds>... V:Val |-> (N => N +Int 1) ...</holds>
Similarly, there are two corresponding cases to distinguish when a thread
releases a lock:
(1) The thread holds the lock more than once, in which case all it needs to do
is to decrement the lock counter.
(2) The thread holds the lock only once, in which case it needs to remove it
from its holds
cell and also from the the shared busy
cell,
so other threads can acquire it if they need to.
rule <k> release V:Val; => .K ...</k> <holds>... V |-> (N => N -Int 1) ...</holds> requires N >Int 0 rule <k> release V; => .K ...</k> <holds>... V:Val |-> 0 => .Map ...</holds> <busy>... SetItem(V) => .Set ...</busy>
In addition to synchronization through acquire and release of locks, SIMPLE
also provides a construct for rendezvous synchronization. A thread whose next
statement to execute is rendezvous(V)
gets stuck until another
thread reaches an identical statement; when that happens, the two threads
drop their rendezvous statements and continue their executions. If three
threads happen to have an identical rendezvous statement as their next
statement, then precisely two of them will synchronize and the other will
remain blocked until another thread reaches a similar rendezvous statement.
The rule below is as simple as it can be. Note, however, that, again, it is
K's mechanism for configuration abstraction that makes it work as desired:
since the only cell which can multiply containing a k
cell inside is
the thread
cell, the only way to concretize the rule below to the
actual configuration of SIMPLE is to include each k
cell in a
thread
cell.
rule <k> rendezvous V:Val; => .K ...</k> <k> rendezvous V; => .K ...</k>
In this section we define all the auxiliary constructs used in the
above semantics.
The mkDecls
auxiliary construct turns a list of identifiers
and a list of values in a sequence of corresponding variable
declarations.
syntax Stmt ::= mkDecls(Ids,Vals) [function] rule mkDecls((X:Id, Xs:Ids), (V:Val, Vs:Vals)) => var X=V; mkDecls(Xs,Vs) rule mkDecls(.Ids,.Vals) => {}
The operation below is straightforward.
syntax Exp ::= lookup(Int) rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store>
We have already discussed the environment recovery auxiliary operation in the
IMP++ tutorial:
// TODO: eliminate the env wrapper, like we did in IMP++ syntax KItem ::= setEnv(Map) rule <k> setEnv(Env) => .K ...</k> <env> _ => Env </env>
While theoretically sufficient, the basic definition for environment
recovery alone is suboptimal. Consider a loop while (E)S
,
whose semantics (see above) was given by unrolling. S
is a block. Then the semantics of blocks above, together with the
unrolling semantics of the while loop, will yield a computation
structure in the k
cell that increasingly grows, adding a new
environment recovery task right in front of the already existing sequence of
similar environment recovery tasks (this phenomenon is similar to the ``tail
recursion'' problem). Of course, when we have a sequence of environment
recovery tasks, we only need to keep the last one. The elegant rule below
does precisely that, thus avoiding the unnecessary computation explosion
problem:
rule (setEnv(_) => .K) ~> setEnv(_)
In fact, the above follows a common convention in K for recovery
operations of cell contents: the meaning of a computation task of the form
cell(C)
that reaches the top of the computation is that the current
contents of cell cell
is discarded and gets replaced with C
. We
did not add support for these special computation tasks in our current
implementation of K, so we need to define them as above.
For convenience in giving the semantics of constructs like the increment and
the assignment, that we want to operate the same way on variables and on
array elements, we used an auxiliary lvalue(E)
construct which was
expected to evaluate to the lvalue of the expression E
. This is only
defined when E
has an lvalue, that is, when E
is either a variable or
evaluates to an array element. lvalue(E)
evaluates to a value of
the form loc(L)
, where L
is the location where the value of E
can be found; for clarity, we use loc
to structurally distinguish
natural numbers from location values. In giving semantics to lvalue
there are two cases to consider. (1) If E
is a variable, then all we need
to do is to grab its location from the environment. (2) If E
is an array
element, then we first evaluate the array and its index in order to identify
the exact location of the element of concern, and then return that location;
the last rule below works because its preceding context declarations ensure
that the array and its index are evaluated, and then the rule for array lookup
(defined above) rewrites the evaluated array access construct to its
corresponding store lookup operation.
// For parsing reasons, we prefer to allow lvalue to take a K syntax Exp ::= lvalue(K) syntax Val ::= loc(Int) // Local variable rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env> // Array element: evaluate the array and its index; // then the array lookup rule above applies. context lvalue(_::Exp[HOLE::Exps]) context lvalue(HOLE::Exp[_::Exps]) // Finally, return the address of the desired object member rule lvalue(lookup(L:Int) => loc(L))
The following operation initializes a sequence of locations with the same
value:
syntax Map ::= Int "..." Int "|->" K [function] rule N...M |-> _ => .Map requires N >Int M rule N...M |-> K => N |-> K (N +Int 1)...M |-> K requires N <=Int M
The semantics of SIMPLE is now complete. Make sure you kompile the
definition with the right options in order to generate the desired model.
No kompile options are needed if you only only want to execute the definition
(and thus get an interpreter), but if you want to search for a different
program behaviors then you need to kompile with the --enable-search option
endmodule
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K definition of the static semantics of the typed SIMPLE
language, or in other words, a type system for the typed SIMPLE
language in K. We do not re-discuss the various features of the
SIMPLE language here. The reader is referred to the untyped version of
the language for such discussions. We here only focus on the new and
interesting problems raised by the addition of type declarations, and
what it takes to devise a type system/checker for the language.
When designing a type system for a language, no matter in what
paradigm, we have to decide upon the intended typing policy. Note
that we can have multiple type systems for the same language, one for
each typing policy. For example, should we accept programs which
don't have a main function? Or should we allow functions that do not
return explicitly? Or should we allow functions whose type expects
them to return a value (say an int
) to use a plain
return;
statement, which returns no value, like in C?
And so on and so forth. Typically, there are two opposite tensions
when designing a type system. On the one hand, you want your type
system to be as permissive as possible, that is, to accept as many
programs that do not get stuck when executed with the untyped
semantics as possible; this will keep the programmers using your
language happy. On the other hand, you want your type system to have
a reasonable performance when implemented; this will keep both the
programmers and the implementers of your language happy. For example,
a type system for rejecting programs that could perform
division-by-zero is not expected to be feasible in general. A simple
guideline when designing typing policies is to imagine how the
semantics of the untyped language may get stuck and try to prevent
those situations from happening.
Before we give the K type system of SIMPLE formally, we discuss,
informally, the intended typing policy:
Each program should contain a main()
function. Indeed,
the untyped SIMPLE semantics will get stuck on any program which does
not have a main
function.
Each primitive value has its own type, which can be int
bool
, or string
. There is also a type void
for nonexistent values, for example for the result of a function meant
to return no value (but only be used for its side effects, like a
procedure).
The syntax of untyped SIMPLE is extended to allow type
declarations for all the variables, including array variables. This is
done in a C/Java-style. For example, int x;
or
int x=7, y=x+3;
, or int[][][] a[10,20];
(the latter defines a 10 × 20
matrix of arrays of integers).
Recall from untyped SIMPLE that, unlike in C/Java, our multi-dimensional
arrays use comma-separated arguments, although they have the array-of-array
semantics.
Functions are also typed in a C/Java style. However, since in SIMPLE
we allow functions to be passed to and returned by other functions, we also
need function types. We will use the conventional higher-order arrow-notation
for function types, but will separate the argument types with commas. For
example, a function returning an array of bool
elements and
taking as argument an array x
of two-integer-argument functions
returning an integer, is declared using a syntax of the form
bool[] f(((int,int)->int)[] x) { ... }
and has the type ((int,int)->int)[] -> bool[]
.
We allow any variable declarations at the top level. Functions
can only be declared at the top level. Each function can only access the
other functions and variables declared at the top level, or its own locally
declared variables. SIMPLE has static scoping.
The various expression and statement constructs take only elements of
the expected types.
Increment and assignment can operate both on variables and on array
elements. For example, if f
has type int->int[][]
and
function g
has the type int->int
, then the
increment expression ++f(7)[g(2),g(3)]
is valid.
Functions should only return values of their declared result
type. To give the programmers more flexibility, we allow functions to
use return;
statements to terminate without returning an
actual value, or to not explicitly use any return statement,
regardless of their declared return type. This flexibility can be
handy when writing programs using certain functions only for their
side effects. Nevertheless, as the dynamic semantics shows, a return
value is automatically generated when an explicit return
statement is not encountered.
For simplicity, we here limit exceptions to only throw and catch
integer values. We let it as an exercise to the reader to extend the
semantics to allow throwing and catching arbitrary-type exceptions.
Like in programming languages like Java, one can go even further and
define a semantics where thrown exceptions are propagated through
try-catch statements until one of the corresponding type is found.
We will do this when we define the KOOL language, not here.
To keep the definition if SIMPLE simple, here we do not attempt to
reject programs which throw uncaught exceptions.
Like in untyped SIMPLE, some constructs can be desugared into a
smaller set of basic constructs. In general, it should be clear why a
program does not type by looking at the top of the k
cells in
its stuck configuration.
module SIMPLE-TYPED-STATIC-SYNTAX imports DOMAINS-SYNTAX
The syntax of typed SIMPLE extends that of untyped SIMPLE with support
for declaring types to variables and functions.
syntax Id ::= "main" [token]
Primitive, array and function types, as well as lists (or tuples) of types.
The lists of types are useful for function arguments.
syntax Type ::= "void" | "int" | "bool" | "string" | Type "[" "]" | "(" Type ")" [bracket] > Types "->" Type syntax Types ::= List{Type,","} [overload(exps)]
Variable and function declarations have the expected syntax. For variables,
we basically just replaced the var
keyword of untyped SIMPLE with a
type. For functions, besides replacing the function
keyword with a
type, we also introduce a new syntactic category for typed variables,
Param
, and lists over it.
syntax Param ::= Type Id syntax Params ::= List{Param,","} syntax Stmt ::= Type Exps ";" | Type Id "(" Params ")" Block
The syntax of expressions is identical to that in untyped SIMPLE,
except for the logical conjunction and disjunction which have
different strictness attributes, because they now have different
evaluation strategies.
syntax Exp ::= Int | Bool | String | Id | "(" Exp ")" [bracket] | "++" Exp > Exp "[" Exps "]" [strict] > Exp "(" Exps ")" [strict] | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict, left] | Exp "||" Exp [strict, left] > "spawn" Block > Exp "=" Exp [strict(2), right]
Note that spawn
has not been declared strict. This may
seem unexpected, because the child thread shares the same environment
with the parent thread, so from a typing perspective the spawned
statement makes the same sense in a child thread as it makes in the
parent thread. The reason for not declaring it strict is because we
want to disallow programs where the spawned thread calls the
return
statement, because those programs would get stuck in
the dynamic semantics. The type semantics of spawn below will reject
such programs.
We still need lists of expressions, defined below, but note that we do
not need lists of identifiers anymore. They have been replaced by the lists
of parameters.
syntax Exps ::= List{Exp,","} [strict, overload(exps)]
The statements have the same syntax as in untyped SIMPLE, except for
the exceptions, which now type their parameter. Note that, unlike in untyped
SIMPLE, all statement constructs which have arguments and are not desugared
are strict, including the conditional and the while
. Indeed, from a
typing perspective, they are all strict: first type their arguments and then
type the actual construct.
syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block [strict] | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "return" Exp ";" [strict] | "return" ";" | "print" "(" Exps ")" ";" [strict] | "try" Block "catch" "(" Param ")" Block [strict(1)] | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict]
Note that the sequential composition is now sequentially strict,
because, unlike in the dynamic semantics where statements dissolved,
they now reduce to the stmt
type, which is a result.
syntax Stmt ::= Stmt Stmt [seqstrict, right]
We use the same desugaring macros like in untyped SIMPLE, but, of
course, including the types of the involved variables.
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S:Stmt} => {Start while(Cond){S Step;}} rule for(Start Cond; Step) {} => {Start while(Cond){Step;}} rule T:Type E1:Exp, E2:Exp, Es:Exps; => T E1; T E2, Es; [anywhere] rule T:Type X:Id = E; => T X; X = E; [anywhere] endmodule module SIMPLE-TYPED-STATIC imports SIMPLE-TYPED-STATIC-SYNTAX imports DOMAINS
Here we define the type system of SIMPLE. Like concrete semantics,
type systems defined in K are also executable. However, K type
systems turn into type checkers instead of interpreters when executed.
The typing process is done in two (overlapping) phases. In the first
phase the global environment is built, which contains type bindings
for all the globally declared variables and functions. For functions,
the declared types will be ``trusted'' during the first phase and
simply bound to their corresponding function names and placed in the
global type environment. At the same time, type-checking tasks that
the function bodies indeed respect their claimed types are generated.
All these tasks are (concurrently) verified during the second phase.
This way, all the global variable and function declarations are
available in the global type environment and can be used in order to
type-check each function code. This is consistent with the semantics
of untyped SIMPLE, where functions can access all the global variables
and can call any other function declared in the same program. The
two phases may overlap because of the K concurrent semantics. For
example, a function task can be started while the first phase is still
running; moreover, it may even complete before the first phase does,
namely when all the global variables and functions that it needs have
already been processed and made available in the global environment by
the first phase task.
The idea is to start with a configuration holding the program to type
in one of its cells, then apply rewrite rules on it mixing types and
language syntax, and eventually obtain a type instead of the original
program. In other words, the program reduces to its type using
the K rules giving the type system of the language. In doing so,
additional typing tasks for function bodies are generated and solved
the same way. If this rewriting process gets stuck, then we say that
the program is not well-typed. Otherwise the program is well-typed
(by definition). We did not need types for statements and for blocks
as part of the typed SIMPLE syntax, because programmers are not allowed
to use such types explicitly. However, we are going to need them in the
type system, because blocks and statements reduce to them.
We start by allowing types to be used inside expressions and statements in
our language. This way, types can be used together with language syntax in
subsequent K rules without any parsing errors. Like in the type system of
IMP++ in the K tutorial, we prefer to group the block and statement types
under one syntactic sub-category of types, because this allows us to more
compactly state that certain terms can be either blocks or statements. Also,
since programs and fragments of program will reduce to their types, in order
for the strictness and context declarations to be executable we state that
types are results (same like we did in the IMP++ tutorial).
syntax Exp ::= Type syntax Exps ::= Types syntax BlockOrStmtType ::= "block" | "stmt" syntax Type ::= BlockOrStmtType syntax Block ::= BlockOrStmtType syntax KResult ::= Type | Types //TODO: remove this, eventually
The configuration of our type system consists of a tasks
cell
holding various typing task
cells, and a global type environment.
Each task includes a k
cell holding the code to type, a tenv
cell holding the local type environment, and a return
cell holding
the return type of the currently checked function. The latter is needed in
order to check whether return statements return values of the expected type.
Initially, the program is placed in a k
cell inside a
task
cell. Since the cells with multiplicity ?
are not
included in the initial configuration, the task
cell holding
the original program in its k
cell will contain no other
subcells.
configuration <T color="yellow"> <tasks color="orange"> <task multiplicity="*" color="yellow" type="Set"> <k color="green"> $PGM:Stmt </k> <tenv multiplicity="?" color="cyan"> .Map </tenv> <returnType multiplicity="?" color="black"> void </returnType> </task> </tasks> // <br/> <gtenv color="blue"> .Map </gtenv> </T>
Variable declarations type as statements, that is, they reduce to the
type stmt
. There are only two cases that need to be
considered: when a simple variable is declared and when an array
variable is declared. The macros at the end of the syntax module
above take care of reducing other variable declarations, including
ones where the declared variables are initialized, to only these two
cases. The first case has two subcases: when the variable declaration
is global (i.e., the task
cell contains only the k
cell), in which case it is added to the global type environment
checking at the same time that the variable has not been already
declared; and when the variable declaration is local (i.e., a
tenv
cell is available), in which case it is simply added to
the local type environment, possibly shadowing previous homonymous
variables. The third case reduces to the second, incrementally moving
the array dimension into the type until the array becomes a simple
variable.
rule <task> <k> T:Type X:Id; => stmt ...</k> </task> <gtenv> Rho (.Map => X |-> T) </gtenv> requires notBool(X in keys(Rho)) rule <k> T:Type X:Id; => stmt ...</k> <tenv> Rho => Rho[X <- T] </tenv> context _:Type _::Exp[HOLE::Exps]; // The rule below may need to sort E to Exp in the future, if the // parser gets stricter; without that information, it may not be able // to complete the LHS into T E[int,Ts],.Exps; (and similarly for the RHS) rule T:Type E:Exp[int,Ts:Types]; => T[] E[Ts]; // I want to write the rule below as _:Type (E:Exp[.Types] => E), // but the list completion seems to not work well with that. rule T:Type E:Exp[.Types]; => T E;
Functions are allowed to be declared only at the top level (the
task
cell holds only its k
subcell). Each function
declaration reduces to a variable declaration (a binding of its name
to its declared function type), but also adds a task into the
tasks
cell. The task consists of a typing of the statement
declaring all the function parameters followed by the function body,
together with the expected return type of the function. The
getTypes
and mkDecls
functions, defined at the end of
the file in the section on auxiliary operations, extracts the list of
types and makes a sequence of variable declarations from a list of
function parameters, respectively. Note that, although in the dynamic
semantics we include a terminating return
statement at the
end of the function body to eliminate from the analysis the case when
the function does not provide an explicit return, we do not need to
include such a similar return
statement here. That's because
the return
statements type to stmt
anyway, and the
entire code of the function body needs to type anyway.
rule <task> <k> T:Type F:Id(Ps:Params) S => getTypes(Ps)->T F; ...</k> </task> (.Bag => <task> <k> mkDecls(Ps) S </k> <tenv> .Map </tenv> <returnType> T </returnType> </task>)
main()
exists}Once the entire program is processed (generating appropriate tasks
to type check its function bodies), we can dissolve the main
task
cell (the one holding only a k
subcell). Since
we want to enforce that programs include a main function, we also
generate a function task executing main()
to ensure that it
types (remove this task creation if you do not want your type system
to reject programs without a main
function).
rule <task> <k> stmt => main(.Exps); </k> (.Bag => <tenv> .Map </tenv>) </task>
Similarly, once a non-main task (i.e., one which contains a
tenv
subcells) is completed using the subsequent rules (i.e.,
its k
cell holds only the block
or stmt
type), we can dissolve its corresponding cell. Note that it is
important to ensure that we only dissolve tasks containing a
tenv
cell with the rule below, because the main task should
not
dissolve this way! It should do what the above rule says.
In the end, there should be no task cell left in the configuration
when the program correctly type checks.
rule <task>... <k> _:BlockOrStmtType </k> <tenv> _ </tenv> ...</task> => .Bag
The first three rewrite rules below reduce the primitive values to
their types, as we typically do when we define type systems in K.
rule _:Int => int rule _:Bool => bool rule _:String => string
There are three cases to distinguish for variable lookup: (1) if the
variable is bound in the local type environment, then look its type up
there; (2) if a local environment exists and the variable is not bound
in it, then look its type up in the global environment; (3) finally,
if there is no local environment, meaning that we are executing the
top-level pass, then look the variable's type up in the global
environment, too.
rule <k> X:Id => T ...</k> <tenv>... X |-> T ...</tenv> rule <k> X:Id => T ...</k> <tenv> Rho </tenv> <gtenv>... X |-> T ...</gtenv> requires notBool(X in keys(Rho)) rule <task> <k> X:Id => T ...</k> </task> <gtenv>... X |-> T ...</gtenv>
We want the increment operation to apply to any lvalue, including
array elements, not only to variables. For that reason, we define a
special context extracting the type of the argument of the increment
operation only if that argument is an lvalue. Otherwise the rewriting
process gets stuck. The operation ltype
is defined at the
end of this file, in the auxiliary operation section. It essentially
acts as a filter, getting stuck if its argument is not an lvalue and
letting it reduce otherwise. The type of the lvalue is expected to be
an integer in order to be allowed to be incremented, as seen in the
rule ++ int => int
below.
context ++(HOLE => ltype(HOLE)) rule ++ int => int
The rules below are straightforward and self-explanatory:
rule int + int => int rule string + string => string rule int - int => int rule int * int => int rule int / int => int rule int % int => int rule - int => int rule int < int => bool rule int <= int => bool rule int > int => bool rule int >= int => bool rule T:Type == T => bool rule T:Type != T => bool rule bool && bool => bool rule bool || bool => bool rule ! bool => bool
Array access requires each index to type to an integer, and the
array type to be at least as deep as the number of indexes:
// NOTE: // We used to need parentheses in the RHS, to avoid capturing Ts as an attribute // Let's hope that is not a problem anymore. rule (T[])[int, Ts:Types] => T[Ts] rule T:Type[.Types] => T
sizeOf
only needs to check that its argument is an array:
rule sizeOf(_T[]) => int
The read expression construct types to an integer, while print types
to a statement provided that all its arguments type to integers or
strings.
rule read() => int rule print(T:Type, Ts => Ts); requires T ==K int orBool T ==K string rule print(.Types); => stmt
The special context and the rule for assignment below are similar
to those for increment: the LHS of the assignment must be an lvalue
and, in that case, it must have the same type as the RHS, which then
becomes the type of the assignment.
context (HOLE => ltype(HOLE)) = _ rule T:Type = T => T
Function application requires the type of the function and the
types of the passed values to be compatible. Note that a special case
is needed to handle the no-argument case:
rule (Ts:Types -> T)(Ts) => T requires Ts =/=K .Types rule (void -> T)(.Types) => T
The returned value must have the same type as the declared
function return type. If an empty return is encountered, than
we should check that we are in a function (and not a thread)
context, that is, a return
cell must be available:
rule <k> return T:Type; => stmt ...</k> <returnType> T </returnType> rule <k> return; => stmt ...</k> <returnType> _ </returnType>
To avoid having to recover type environments after blocks, we prefer
to start a new task for block body, making sure that the new task
is passed the same type environment and return cells. The value
returned by return
statements must have the same type as
stated in the return
cell. The print
variadic
function is allowed to only print integers and strings. The thrown
exceptions can only have integer type.
rule {} => block rule <task> <k> {S} => block ...</k> <tenv> Rho </tenv> R </task> (.Bag => <task> <k> S </k> <tenv> Rho </tenv> R </task>)
rule _:Type; => stmt
while
looprule if (bool) block else block => stmt rule while (bool) block => stmt
We currently force the parameters of exceptions to only be integers.
Moreover, for simplicity, we assume that integer exceptions can be
thrown from anywhere, including from functions which do not define
any try-catch block (with the currently unchecked ‒also for
simplicity‒ expectation that the caller functions would catch those
exceptions).
rule try block catch(int X:Id) {S} => {int X; S} rule try block catch(int X:Id) {} => {int X;} rule throw int; => stmt
Nothing special about typing the concurrency constructs, except that
we do not want the spawned thread to return, so we do not include any
return
cell in the new task cell for the thread statement.
Same like with the functions above, we do not check for thrown
exceptions which are not caught.
rule <k> spawn S => int ...</k> <tenv> Rho </tenv> (.Bag => <task> <k> S </k> <tenv> Rho </tenv> </task>) rule join int; => stmt rule acquire _:Type; => stmt rule release _:Type; => stmt rule rendezvous _:Type; => stmt rule _:BlockOrStmtType _:BlockOrStmtType => stmt
The function mkDecls
turns a list of parameters into a
list of variable declarations.
syntax Stmt ::= mkDecls(Params) [function] rule mkDecls(T:Type X:Id, Ps:Params) => T X; mkDecls(Ps) rule mkDecls(.Params) => {}
The ltype
context allows only expressions which have an
lvalue to evaluate.
syntax LValue ::= Id rule isLValue(_:Exp[_:Exps]) => true syntax Exp ::= LValue // K should be able to infer this // if not added, then it gets stuck with an Id on k cell // Instead of the second LValue production above you can use a rule: // rule isLValue(_:Exp[_:Exps]) => true syntax Exp ::= ltype(Exp) // context ltype(HOLE:LValue) // The above context does not work due to some error, so we write instead context ltype(HOLE) requires isLValue(HOLE)
The function getTypes
is the same as in SIMPLE typed dynamic.
syntax Types ::= getTypes(Params) [function] rule getTypes(T:Type _:Id) => T, .Types // I would like to not use .Types rule getTypes(T:Type _:Id, P, Ps) => T, getTypes(P,Ps) rule getTypes(.Params) => void, .Types endmodule
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K dynamic semantics of the typed SIMPLE language.
It is very similar to the semantics of the untyped SIMPLE, the
difference being that we now dynamically check the typing policy
described in the static semantics of typed SIMPLE. Because of the
dynamic nature of the semantics, we can also perform some additional
checks which were not possible in the static semantics, such as
memory leaks due to accessing an array out of its bounds. We will
highlight the differences between the dynamically typed and the
untyped SIMPLE as we proceed with the semantics. We recommend the
reader to consult the typing policy and the syntax of types discussed
in the static semantics of the typed SIMPLE language.
module SIMPLE-TYPED-DYNAMIC-SYNTAX imports DOMAINS-SYNTAX
The syntax of typed SIMPLE extends that of untyped SIMPLE with support
for declaring types to variables and functions.
The syntax below is identical to that of the static semantics of typed
SIMPLE. However, the K strictness attributes are like those of the untyped
SIMPLE, to capture the desired evaluation strategies of the various language
constructs.
syntax Id ::= "main" [token]
syntax Type ::= "void" | "int" | "bool" | "string" | Type "[" "]" | "(" Type ")" [bracket] > Types "->" Type syntax Types ::= List{Type,","} [overload(exps)]
syntax Param ::= Type Id syntax Params ::= List{Param,","} syntax Stmt ::= Type Exps ";" | Type Id "(" Params ")" Block
syntax Exp ::= Int | Bool | String | Id | "(" Exp ")" [bracket] | "++" Exp > Exp "[" Exps "]" [strict] > Exp "(" Exps ")" [strict] | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict(1), left] | Exp "||" Exp [strict(1), left] > "spawn" Block > Exp "=" Exp [strict(2), right]
Like in the static semantics, there is no need for lists of identifiers
(because we now have lists of parameters).
syntax Exps ::= List{Exp,","} [strict, overload(exps)] syntax Val syntax Vals ::= List{Val,","} [overload(exps)]
syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict(1)] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "print" "(" Exps ")" ";" [strict] | "return" Exp ";" [strict] | "return" ";" | "try" Block "catch" "(" Param ")" Block | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict] syntax Stmt ::= Stmt Stmt [right]
The same desugaring macros like in the statically typed SIMPLE.
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S:Stmt} => {Start while(Cond){S Step;}} rule for(Start Cond; Step) {} => {Start while(Cond){Step;}} rule T:Type E1:Exp, E2:Exp, Es:Exps; => T E1; T E2, Es; [anywhere] rule T:Type X:Id = E; => T X; X = E; [anywhere] endmodule module SIMPLE-TYPED-DYNAMIC imports SIMPLE-TYPED-DYNAMIC-SYNTAX imports DOMAINS
These are similar to those of untyped SIMPLE, except that the array
references and the function abstrations now also hold their types.
These types are needed in order to easily compute the type of any
value in the language (see the auxiliary typeOf
operation at
the end of this module).
syntax Val ::= Int | Bool | String | array(Type,Int,Int) | lambda(Type,Params,Stmt) syntax Exp ::= Val syntax Exps ::= Vals syntax KResult ::= Val | Vals // TODO: should not need this
The configuration is almost identical to that of untyped SIMPLE,
except for a return
cell inside the control
cell.
This return
cell will hold, like in the static semantics of
typed SIMPLE, the expected type of the value returned by the function
being executed. The contents of this cell will be set whenever a
function is invoked and will be checked whenever the evaluation of the
function body encounters an explicit return
statement.
// the syntax declarations below are required because the sorts are // referenced directly by a production and, because of the way KIL to KORE // is implemented, the configuration syntax is not available yet // should simply work once KIL is removed completely // check other definitions for this hack as well syntax ControlCell syntax ControlCellFragment configuration <T color="red"> <threads color="orange"> <thread multiplicity="*" color="yellow" type="Map"> <id color="pink"> 0 </id> <k color="green"> ($PGM:Stmt ~> execute) </k> // <br/> <control color="cyan"> <fstack color="blue"> .List </fstack> <xstack color="purple"> .List </xstack> <returnType color="LimeGreen"> void </returnType> </control> // <br/> <env color="violet"> .Map </env> <holds color="black"> .Map </holds> </thread> </threads> // <br/> <genv color="pink"> .Map </genv> <store color="white"> .Map </store> <busy color="cyan">.Set</busy> <terminated color="red"> .Set </terminated> <input color="magenta" stream="stdin"> .List </input> <output color="brown" stream="stdout"> .List </output> <nextLoc color="gray"> 0 </nextLoc> </T>
The undefined
construct is now parameterized by a type.
A main difference between untyped SIMPLE and dynamically typed SIMPLE
is that the latter assigns a type to each of its locations and that
type cannot be changed during the execution of the program. We do not
do any memory management in our semantic definitions here, so
locations cannot be reclaimed, garbage collected and/or reused. Each
location corresponds precisely to an allocated variable or array
element, whose type was explicitly or implicitly declared in the
program and does not change. It is therefore safe to type each
location and then never allow that type to change. The typed
undefined values effectively assign both a type and an undefined value
to a location.
syntax KItem ::= undefined(Type) rule <k> T:Type X:Id; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> undefined(T) ...</store> <nextLoc> L:Int => L +Int 1 </nextLoc>
The dynamic semantics of typed array declarations is similar to that
in untyped SIMPLE, but we have to make sure that we associate the
right type to the allocated locations.
rule <k> T:Type X:Id[N:Int]; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> array(T, L +Int 1, N) (L +Int 1)...(L +Int N) |-> undefined(T) ...</store> <nextLoc> L:Int => L +Int 1 +Int N </nextLoc> requires N >=Int 0 context _:Type _::Exp[HOLE::Exps];
The desugaring of multi-dimensional arrays into unidimensional
ones is also similar to that in untyped SIMPLE, although we have to
make sure that all the declared variables have the right types. The
auxiliary operation T<Vs>
, defined at the end of the file,
adds the length of Vs
dimensions to the type T
.
// TODO: Check the desugaring below to be consistent with the one for untyped simple syntax Id ::= "$1" [token] | "$2" [token] rule T:Type X:Id[N1:Int, N2:Int, Vs:Vals]; => T[]<Vs> X[N1]; { T[][]<Vs> $1=X; for(int $2=0; $2 <= N1 - 1; ++$2) { T X[N2,Vs]; $1[$2] = X; } }
Store all function parameters, as well as the return type, as part
of the lambda abstraction. In the spirit of dynamic typing, we will
make sure that parameters are well typed when the function is invoked.
rule <k> T:Type F:Id(Ps:Params) S => .K ...</k> <env> Env => Env[F <- L] </env> <store>... .Map => L |-> lambda(T, Ps, S) ...</store> <nextLoc> L => L +Int 1 </nextLoc>
main()
When done with the first pass, call main()
.
syntax KItem ::= "execute" rule <k> execute => main(.Exps); </k> <env> Env </env> <genv> .Map => Env </genv>
rule <k> X:Id => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store>
context ++(HOLE => lvalue(HOLE)) rule <k> ++loc(L) => I +Int 1 ...</k> <store>... L |-> (I:Int => I +Int 1) ...</store>
rule I1 + I2 => I1 +Int I2 rule Str1 + Str2 => Str1 +String Str2 rule I1 - I2 => I1 -Int I2 rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2 rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2 rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E
Check array bounds, as part of the dynamic typing policy.
// Same comment as for simple untyped regarding [anywhere] rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs] [anywhere] // Same comment as for simple untyped regarding [anywhere] rule array(_:Type, L:Int, M:Int)[N:Int] => lookup(L +Int N) requires N >=Int 0 andBool N <Int M [anywhere]
rule sizeOf(array(_,_,N)) => N
Define function call and return together, to see their relationship.
Note that the operation mkDecls
now declares properly typed
instantiated variables, and that the semantics of return
also
checks that that type of the returned value is expected one.
syntax KItem ::= (Type,Map,K,ControlCellFragment) rule <k> lambda(T,Ps,S)(Vs:Vals) ~> K => mkDecls(Ps,Vs) S return; </k> <control> <fstack> .List => ListItem((T',Env,K,C)) ...</fstack> <returnType> T' => T </returnType> C </control> <env> Env => GEnv </env> <genv> GEnv </genv> rule <k> return V:Val; ~> _ => V ~> K </k> <control> <fstack> ListItem((T',Env,K,C)) => .List ...</fstack> <returnType> T => T' </returnType> (_ => C) </control> <env> _ => Env </env> requires typeOf(V) ==K T // check the type of the returned value
Like the undefined
above, nothing
also gets
tagged with a type now. The empty return
statement is
completed to return the nothing
value tagged as expected.
syntax Val ::= nothing(Type) rule <k> return; => return nothing(T); ...</k> <returnType> T </returnType>
rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input>
The assignment now checks that the type of the assigned location is
preserved:
context (HOLE => lvalue(HOLE)) = _ rule <k> loc(L) = V:Val => V ...</k> <store>... L |-> (V' => V) ...</store> requires typeOf(V) ==K typeOf(V')
rule {} => .K rule <k> { S } => S ~> setEnv(Env) ...</k> <env> Env </env>
rule S1:Stmt S2:Stmt => S1 ~> S2
rule _:Val; => .K
rule if ( true) S else _ => S rule if (false) _ else S => S
rule while (E) S => if (E) {S while(E)S}
We only allow printing integers and strings:
rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output> requires typeOf(V) ==K int orBool typeOf(V) ==K string rule print(.Vals); => .K
Exception parameters are now typed, but note that the semantics below
works correctly only when the thrown exception has the same type as
the innermost try-catch paramete. To keep things simple, for the time
being we can assume that SIMPLE only throws and catches integer
values, in which case our semantics below works fine:
syntax KItem ::= (Param,Stmt,K,Map,ControlCellFragment) // Param instead of Id syntax KItem ::= "popx" rule <k> (try S1 catch(P) S2 => S1 ~> popx) ~> K </k> <control> <xstack> .List => ListItem((P, S2, K, Env, C)) ...</xstack> C </control> <env> Env </env> rule <k> popx => .K ...</k> <xstack> ListItem(_) => .List ...</xstack> rule <k> throw V:Val; ~> _ => { T X = V; S2 } ~> K </k> <control> <xstack> ListItem((T:Type X:Id, S2, K, Env, C)) => .List ...</xstack> (_ => C) </control> <env> _ => Env </env>
rule <thread>... <k> spawn S => !T:Int +Int 1 ...</k> <env> Env </env> ...</thread> (.Bag => <thread>... <k> S </k> <env> Env </env> <id> !T +Int 1 </id> ...</thread>)
rule (<thread>... <k>.K</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag) <busy> Busy => Busy -Set keys(H) </busy> <terminated>... .Set => SetItem(T) ...</terminated>
rule <k> join T:Int; => .K ...</k> <terminated>... SetItem(T) ...</terminated>
rule <k> acquire V:Val; => .K ...</k> <holds>... .Map => V |-> 0 ...</holds> <busy> Busy (.Set => SetItem(V)) </busy> requires (notBool(V in Busy:Set)) rule <k> acquire V; => .K ...</k> <holds>... V:Val |-> (N:Int => N +Int 1) ...</holds>
rule <k> release V:Val; => .K ...</k> <holds>... V |-> (N => N:Int -Int 1) ...</holds> requires N >Int 0 rule <k> release V; => .K ...</k> <holds>... V:Val |-> 0 => .Map ...</holds> <busy>... SetItem(V) => .Set ...</busy>
rule <k> rendezvous V:Val; => .K ...</k> <k> rendezvous V; => .K ...</k>
Turns a list of parameters and a list of instance values for them
into a list of variable declarations.
syntax Stmt ::= mkDecls(Params,Vals) [function] rule mkDecls((T:Type X:Id, Ps:Params), (V:Val, Vs:Vals)) => T X=V; mkDecls(Ps,Vs) rule mkDecls(.Params,.Vals) => {}
Location lookup.
syntax Exp ::= lookup(Int) // see NOTES.md for why Exp instead of KItem rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store>
Environment recovery.
// TODO: same comment regarding setEnv(...) as for simple untyped syntax KItem ::= setEnv(Map) rule <k> setEnv(Env) => .K ...</k> <env> _ => Env </env> rule (setEnv(_) => .K) ~> setEnv(_)
lvalue and loc
syntax Exp ::= lvalue(K) syntax Val ::= loc(Int) rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env> //context lvalue(_[HOLE]) //context lvalue(HOLE[_]) context lvalue(_::Exp[HOLE::Exps]) context lvalue(HOLE::Exp[_::Exps]) rule lvalue(lookup(L:Int) => loc(L))
Adds the corresponding depth to an array type
syntax Type ::= Type "<" Vals ">" [function] rule T:Type<_,Vs:Vals> => T[]<Vs> rule T:Type<.Vals> => T
Sequences of locations.
syntax Map ::= Int "..." Int "|->" K [function] rule N...M |-> _ => .Map requires N >Int M rule N...M |-> K => N |-> K (N +Int 1)...M |-> K requires N <=Int M // Type of a value. syntax Type ::= typeOf(K) [function] rule typeOf(_:Int) => int rule typeOf(_:Bool) => bool rule typeOf(_:String) => string rule typeOf(array(T,_,_)) => (T[]) // () needed! K parses [] as "no tags" rule typeOf(lambda(T,Ps,_)) => getTypes(Ps) -> T rule typeOf(undefined(T)) => T rule typeOf(nothing(T)) => T
List of types of a parameter.
syntax Types ::= getTypes(Params) [function] rule getTypes(T:Type _:Id) => T, .Types // I would like to not use .Types rule getTypes(T:Type _:Id, P, Ps) => T, getTypes(P,Ps) rule getTypes(.Params) => void, .Types endmodule
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of the untyped KOOL language. KOOL
is aimed at being a pedagogical and research language that captures
the essence of the object-oriented programming paradigm. Its untyped
variant discussed here is simpler than the typed one, ignoring several
intricate aspects of types in the presence of objects. A program
consists of a set of class declarations. Each class can extend at
most one other class (KOOL is single-inheritance). A class can
declare a set of fields and a set of methods, all public and called
the class' members. Specifically, KOOL includes the
following features:
Class declarations, where a class may or may not explicitly
extend another class. In case a class does not explicitly extend
another class, then it is assumed that it extends the default top-most
and empty (i.e., no members) class called Object
. Each class
is required to declare precisely one homonymous method, called its
constructor. Each valid program should contain one class
named Main
, whose constructor, Main()
, takes no
arguments. The execution of a program consists of creating an object
instance of class Main
and invoking the constructor
Main()
on it, that is, of executing new Main();
.
All features of SIMPLE (see examples/simple/untyped
),
i.e., multidimensional arrays, function (here called "method")
abstractions with call-by-value parameter passing style and static
scoping, blocks with locals, input/output, parametric exceptions, and
concurrency via dynamic thread creation/termination and synchronization.
The only change in the syntax of SIMPLE when imported in KOOL is the
function declaration keyword, function
, which is changed into
method
. The exact same desugaring macros from SIMPLE are
also included in KOOL. We can think of KOOL's classes as embedding
SIMPLE programs (extended with OO constructs, as discussed next).
Object creation using the new C(e1,...,en)
expression construct. An object instance of class C
is first
created and then the constructor C(e1,...,en)
is implicitly
called on that object. KOOL only allows (and requires) one
constructor per class. The class constructor can be called either
implicitly during a new object creation for the class, or explicitly.
The superclass constructor is not implicitly invoked when a
class constructor is invoked; if you want to invoke the superclass
constructor from a subclass constructor then you have to do it
explicitly.
An expression construct this
, which evaluates to the
current object.
An expression construct super
, which is used (only) in
combination with member lookup (see next) to refer to a superclass
field or method.
A member lookup expression construct e.x
, where e
is an expression (either an expression expected to evaluate to an object
or the super
construct) and x
is a class member name,
that is, a field or a method name.
Expression constructs e instanceOf C
and
(C) e
, where e
is an expression expected
to evaluate to an object and C
a class name. The former
tells whether the class of e
is a subclass of C
,
that is, whether e
can be used as an instance of C
,
and the latter changes the class of e
to C
. These
operations always succeed: the former returns a Boolean value, while
the latter changes the current class of e
to C
regardless of whether it is safe to do so or not. The typed version
of KOOL will check the safety of casting by ensuring that the instance
class of the object is a subclass of C
. In untyped KOOL we
do not want to perform this check because we want to allow the
programmer maximum of flexibility: if one always accesses only
available members, then the program can execute successfully despite
the potentially unsafe cast.
There are some specific aspects of KOOL that need to be discussed.
First, KOOL is higher-order, allowing function abstractions to be
treated like any other values in the language. For example, if
m
is a method of object e
then e.m
evaluates to the corresponding function abstraction. The function
abstraction is in fact a closure, because in addition to the method
parameters and body it also encapsulates the object value (i.e., the
environment of the object together with its current class—see below)
that e
evaluates to. This way, function abstractions can be
invoked anywhere and have the capability to change the state of their
object. For example, if m
is a method of object e
which increments a field c
of e
when invoked, and if
getm
is another method of e
which simply returns
m
when invoked, then the double application
(e.getm())()
has the same effect as e.m()
, that is,
increments the counter c
of e
. Note that the
higher-order nature of KOOL was not originally planned; it came as a
natural consequence of evaluating methods to closures and we decided
to keep it. If you do not like it then do not use it.
Second, since all the fields and methods are public in KOOL and since
they can be redeclared in subclasses, it is not immediately clear how
to lookup the member x
when we write e.x
and
e
is different from super
. We distinguish two cases,
depending on whether e.x
occurs in a method invocation
context (i.e., e.x(...)
) or in a field context. KOOL has
dynamic method dispatch, so if e.x
is invoked as a method
then x
will be searched for starting with the instance class of
the object value to which e
evaluates. If e.x
occurs in a non-method-invocation context then x
will be
treated as a field (although it may hold a method closure due to the
higher-order nature of KOOL) and thus will be searched starting with
the current class of the object value of e
(which, because of
this
and casting, may be different from its instance class).
In order to achieve the above, each object value will consist of a
pair holding the current class of the object and an environment stack
with one layer for each class in the object's instance class hierarchy.
Third, although KOOL is dynamic method dispatch, its capabilities
described above are powerful enough to allow us to mimic static
method dispatch. For example, suppose that you want to invoke method
m()
statically. Then all you need to do is to declare a
local variable and bind it to m
, for example var staticm = m;
, and
then call staticm()
. This works because
staticm
is first bound to the method closure that m
evaluates to, and then looked up as any local variable when invoked.
We only enable the dynamic method dispatch when we have an object
member on an application position, e.g., m()
.
In what follows, we limit our comments to the new, KOOL-specific
aspects of the language. We refer the reader to the untyped SIMPLE
language for documentation on the the remaining features, because
those were all borrowed from SIMPLE.
module KOOL-UNTYPED-SYNTAX imports DOMAINS-SYNTAX
The syntax of KOOL extends that of SIMPLE with object-oriented
constructs. We removed from the K annotated syntax of SIMPLE two
constructs, namely the one for function declarations (because we want
to call them methods
now) and the one for function application
(because application is not strict in the first argument
anymore—needs to initiate dynamic method dispatch). The additional
syntax includes:
Object
, forfunction
keyword of SIMPLE into method
.Object
.strict(2)
.syntax Id ::= "Object" [token] | "Main" [token] syntax Stmt ::= "var" Exps ";" | "method" Id "(" Ids ")" Block // called "function" in SIMPLE | "class" Id Block // KOOL | "class" Id "extends" Id Block // KOOL syntax Exp ::= Int | Bool | String | Id | "this" // KOOL | "super" // KOOL | "(" Exp ")" [bracket] | "++" Exp | Exp "instanceOf" Id [strict(1)] // KOOL | "(" Id ")" Exp [strict(2)] // KOOL cast | "new" Id "(" Exps ")" [strict(2)] // KOOL | Exp "." Id // KOOL > Exp "[" Exps "]" [strict] > Exp "(" Exps ")" [strict(2)] // was strict in SIMPLE | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict(1), left] | Exp "||" Exp [strict(1), left] > "spawn" Block > Exp "=" Exp [strict(2), right] syntax Ids ::= List{Id,","} syntax Exps ::= List{Exp,","} [strict, overload(exps)] syntax Val syntax Vals ::= List{Val,","} [overload(exps)] syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict(1)] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "return" Exp ";" [strict] | "return" ";" [macro] | "print" "(" Exps ")" ";" [strict] | "try" Block "catch" "(" Id ")" Block | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict] syntax Stmt ::= Stmt Stmt [right]
Old desugaring rules, from SIMPLE
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S} => {Start while (Cond) {S Step;}} rule var E1::Exp, E2::Exp, Es::Exps; => var E1; var E2, Es; [anywhere] rule var X::Id = E; => var X; X = E; [anywhere]
New desugaring rule
rule class C:Id S => class C extends Object S // KOOL endmodule
We first discuss the new configuration of KOOL, which extends that of
SIMPLE. Then we include the semantics of the constructs borrowed from
SIMPLE unchanged; we refrain from discussing those, because they were
already discussed in the K definition of SIMPLE. Then we discuss
changes to SIMPLE's semantics needed for the more general meaning of
the previous SIMPLE constructs (for example for thread spawning,
assignment, etc.). Finally, we discuss in detail the
semantics of the additional KOOL constructs.
module KOOL-UNTYPED imports KOOL-UNTYPED-SYNTAX imports DOMAINS
KOOL removes one cell and adds two nested cells to the configuration
of SIMPLE. The cell which is removed is the one holding the global
environment, because a KOOL program consists of a set of classes only,
with no global declarations. In fact, since informally speaking each
KOOL class now includes a SIMPLE program, it is safe to say that the
global variables in SIMPLE became class fields in KOOL. Let us now
discuss the new cells that are added to the configuration of SIMPLE.
The cell crntObj
holds data pertaining to the current
object, that is, the object environment in which the code in cell
k
executes: crntClass
holds the current class (which
can change as methods of the current object are invoked);
envStack
holds the stack of environments as a list,
each layer corresponding to one class in the objects' instance class
hierarchy; location
, which is optional, holds the location in
the store where the current object is or has to be located (this is
useful both for method closures and for the semantics of object
creation).
The cell classes
holds all the declared classes, each
class being held in its own class
cell which contains a name
(className
), a parent (extends
), and the actual
member declarations (declarations
).
// the syntax declarations below are required because the sorts are // referenced directly by a production and, because of the way KIL to KORE // is implemented, the configuration syntax is not available yet // should simply work once KIL is removed completely // check other definitions for this hack as well syntax EnvCell syntax ControlCell syntax EnvStackCell syntax CrntObjCellFragment configuration <T color="red"> <threads color="orange"> <thread multiplicity="*" type="Set" color="yellow"> <k color="green"> $PGM:Stmt ~> execute </k> //<br/> // TODO(KORE): support latex annotations #1799 <control color="cyan"> <fstack color="blue"> .List </fstack> <xstack color="purple"> .List </xstack> //<br/> // TODO(KORE): support latex annotations #1799 <crntObj color="Fuchsia"> // KOOL <crntClass> Object </crntClass> <envStack> .List </envStack> <location multiplicity="?"> .K </location> </crntObj> </control> //<br/> // TODO(KORE): support latex annotations #1799 <env color="violet"> .Map </env> <holds color="black"> .Map </holds> <id color="pink"> 0 </id> </thread> </threads> //<br/> // TODO(KORE): support latex annotations #1799 <store color="white"> .Map </store> <busy color="cyan">.Set </busy> <terminated color="red"> .Set </terminated> <input color="magenta" stream="stdin"> .List </input> <output color="brown" stream="stdout"> .List </output> <nextLoc color="gray"> 0 </nextLoc> //<br/> // TODO(KORE): support latex annotations #1799 <classes color="Fuchsia"> // KOOL <classData multiplicity="*" type="Map" color="Fuchsia"> // the Map has as its key the first child of the cell, // in this case the className cell. <className color="Fuchsia"> Main </className> <baseClass color="Fuchsia"> Object </baseClass> <declarations color="Fuchsia"> .K </declarations> </classData> </classes> </T>
The semantics below is taken over from SIMPLE unchanged.
The semantics of function declaration and invocation, including the
use of the special lambda
abstraction value, needs to change
in order to account for the fact that methods are now invoked into
their object's environment. The semantics of function return actually
stays unchanged. Also, the semantics of program initialization is
different: now we have to create an instance of the Main
class which also calls the constructor Main()
, while in
SIMPLE we only had to invoke the function Main()
.
Finally, the semantics of thread spawning needs to change, too: the
parent thread needs to also share its object environment with the
spawned thread (in addition to its local environment, like in SIMPLE).
This is needed in order to be able to spawn method invokations under
dynamic method dispatch; for example, spawn { run(); }
will need to look up the method run()
in the newly created
thread, operation which will most likely fail unless the child thread
sees the object environment of the parent thread. Note that the
spawn
statement of KOOL is more permissive than the threads
of Java. In fact, the latter can be implemented in terms of our
spawn
—see the program threads.kool
for a sketch.
Below is a subset of the values of SIMPLE, which are also values
of KOOL. We will add other values later in the semantics, such as
object and method closures.
syntax Val ::= Int | Bool | String | array(Int,Int) syntax Exp ::= Val syntax Exps ::= Vals syntax KResult ::= Val syntax KResult ::= Vals
The semantics below are taken verbatim from the untyped SIMPLE
definition.
syntax KItem ::= "undefined" rule <k> var X:Id; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> undefined ...</store> <nextLoc> L:Int => L +Int 1 </nextLoc> context var _:Id[HOLE]; rule <k> var X:Id[N:Int]; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> array(L +Int 1, N) (L +Int 1) ... (L +Int N) |-> undefined ...</store> <nextLoc> L:Int => L +Int 1 +Int N </nextLoc> requires N >=Int 0 syntax Id ::= "$1" [token] | "$2" [token] rule var X:Id[N1:Int, N2:Int, Vs:Vals]; => var X[N1]; { var $1=X; for(var $2=0; $2 <= N1 - 1; ++$2) { var X[N2,Vs]; $1[$2] = X; } } rule <k> X:Id => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store> context ++(HOLE => lvalue(HOLE)) rule <k> ++loc(L) => I +Int 1 ...</k> <store>... L |-> (I:Int => I +Int 1) ...</store> rule I1 + I2 => I1 +Int I2 rule Str1 + Str2 => Str1 +String Str2 rule I1 - I2 => I1 -Int I2 rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2 rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2 rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs] [anywhere] rule array(L,_)[N:Int] => lookup(L +Int N) [anywhere] rule sizeOf(array(_,N)) => N
The semantics of function application needs to change into dynamic
method dispatch invocation, which is defined shortly. However,
interestingly, the semantics of return stays unchanged.
rule <k> return(V:Val); ~> _ => V ~> K </k> <control> <fstack> ListItem(fstackFrame(Env,K,XS,<crntObj> CO </crntObj>)) => .List ...</fstack> <xstack> _ => XS </xstack> <crntObj> _ => CO </crntObj> </control> <env> _ => Env </env> syntax Val ::= "nothing" rule return; => return nothing; rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input> context (HOLE => lvalue(HOLE)) = _ rule <k> loc(L) = V:Val => V ...</k> <store>... L |-> (_ => V) ...</store> rule {} => .K rule <k> { S } => S ~> setEnv(Env) ...</k> <env> Env </env> rule S1::Stmt S2::Stmt => S1 ~> S2 rule _:Val; => .K rule if ( true) S else _ => S rule if (false) _ else S => S rule while (E) S => if (E) {S while(E)S} rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output> rule print(.Vals); => .K syntax KItem ::= xstackFrame(Id,Stmt,K,Map,K) // TODO(KORE): drop the additional production once parsing issue #1842 is fixed | (Id,Stmt,K,Map,K) syntax KItem ::= "popx" rule <k> (try S1 catch(X) {S2} => S1 ~> popx) ~> K </k> <control> <xstack> .List => ListItem(xstackFrame(X, S2, K, Env, C)) ...</xstack> C </control> <env> Env </env> rule <k> popx => .K ...</k> <xstack> ListItem(_) => .List ...</xstack> rule <k> throw V:Val; ~> _ => { var X = V; S2 } ~> K </k> <control> <xstack> ListItem(xstackFrame(X, S2, K, Env, C)) => .List ...</xstack> (_ => C) </control> <env> _ => Env </env>
Thread spawning needs a new semantics, because we want the child
thread to also share the object environment with its parent. The new
semantics of thread spawning will be defined shortly. However,
interestingly, the other concurrency constructs keep their semantics
from SIMPLE unchanged.
// TODO(KORE): ..Bag should be . throughout this definition #1772 rule (<thread>... <k>.K</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag) /* rule (<thread>... <k>.</k> <holds>H</holds> <id>T</id> ...</thread> => .) */ <busy> Busy => Busy -Set keys(H) </busy> <terminated>... .Set => SetItem(T) ...</terminated> rule <k> join T:Int; => .K ...</k> <terminated>... SetItem(T) ...</terminated> rule <k> acquire V:Val; => .K ...</k> <holds>... .Map => V |-> 0 ...</holds> <busy> Busy (.Set => SetItem(V)) </busy> requires (notBool(V in Busy:Set)) rule <k> acquire V; => .K ...</k> <holds>... V:Val |-> (N:Int => N +Int 1) ...</holds> rule <k> release V:Val; => .K ...</k> <holds>... V |-> (N => N:Int -Int 1) ...</holds> requires N >Int 0 rule <k> release V; => .K ...</k> <holds>... V:Val |-> 0 => .Map ...</holds> <busy>... SetItem(V) => .Set ...</busy> rule <k> rendezvous V:Val; => .K ...</k> <k> rendezvous V; => .K ...</k>
syntax Stmt ::= mkDecls(Ids,Vals) [function] rule mkDecls((X:Id, Xs:Ids), (V:Val, Vs:Vals)) => var X=V; mkDecls(Xs,Vs) rule mkDecls(.Ids,.Vals) => {} // TODO(KORE): clarify sort inferences #1803 syntax Exp ::= lookup(Int) /* syntax KItem ::= lookup(Int) */ rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store> syntax KItem ::= setEnv(Map) rule <k> setEnv(Env) => .K ...</k> <env> _ => Env </env> rule (setEnv(_) => .K) ~> setEnv(_) // TODO: How can we make sure that the second rule above applies before the first one? // Probably we'll deal with this using strategies, eventually. syntax Exp ::= lvalue(K) syntax Val ::= loc(Int) rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env> context lvalue(_::Exp[HOLE::Exps]) context lvalue(HOLE::Exp[_::Exps]) rule lvalue(lookup(L:Int) => loc(L)) syntax Map ::= Int "..." Int "|->" K [function] rule N...M |-> _ => .Map requires N >Int M rule N...M |-> K => N |-> K (N +Int 1)...M |-> K requires N <=Int M
When we extend a language, sometimes we need to do more than just add
new language constructs and semantics for them. Sometimes we want to
also extend the semantics of existing language constructs, in order to
get more from them.
In SIMPLE, once all the global declarations were processed, the
function main()
was invoked. In KOOL, the global
declarations are classes, and their specific semantics is given
shortly; essentially, they are pre-processed one by one and added
into the class
cell structure in the configuration.
Once all the classes are processed, the computation item
execute
, which was placed right after the program in the
initial configuration, is reached. In SIMPLE, the program was
initialized by calling the method main()
. In KOOL, the
program is initialized by creating an object instance of class
Main
. This will also implicitly call the method
Main()
(the Main
class constructor). The emptiness
of the env
cell below is just a sanity check, to make sure
that the user has not declared anything but classes at the top level
of the program.
syntax KItem ::= "execute" rule <k> execute => new Main(.Exps); </k> <env> .Map </env>
The semantics of new
(defined below) requires the
execution of all the class' declarations (and also of its
superclasses').
Before we can define the semantics of method application (previously
called function application in SIMPLE), we need to add two more values
to the language, namely object and method closures:
syntax Val ::= objectClosure(Id, List) | methodClosure(Id,Int,Ids,Stmt)
An object value consists of an objectClosure
-wrapped bag
containing the current class of the object and the environment stack
of the object. The current class of an object will always be one of
the classes mapped to an environment in the environment stack of the
object. A method closure encapsulates the method's parameters and
code (last two arguments), as well as the object context in which the
method code should execute. This object context includes the current
class of the object (the first argument of methodClosure
) and
the object environment stack (located in the object stored at the
location specified as the second argument of methodClosure
).
KOOL has a complex mechanism to invoke methods, because it allows both
dynamic method dispatch and methods as first-class-citizen values (the
latter making it a higher-order language). The invocation mechanism
will be defined later. What is sufficient to know for now is that
the two arguments of the application construct eventually reduce to
values, the first being a method closure and the latter a list of
values. The semantics of the method closure application is then as
expected: the local environment and control are stacked, then we
switch to method closure's class and object environment and execute
the method body. The mkDecls
construct is the one that came
with the unchanged semantics of SIMPLE above.
syntax KItem ::= fstackFrame(Map,K,List,K) // TODO(KORE): drop the additional production once parsing issue #1842 is fixed | (Map,K,K) rule <k> methodClosure(Class,OL,Xs,S)(Vs:Vals) ~> K => mkDecls(Xs,Vs) S return; </k> <env> Env => .Map </env> <store>... OL |-> objectClosure(_, EnvStack)...</store> //<br/> // TODO(KORE): support latex annotations #1799 <control> <xstack> XS </xstack> <fstack> .List => ListItem(fstackFrame(Env, K, XS, <crntObj> Obj' </crntObj>)) ...</fstack> <crntObj> Obj' => <crntClass> Class </crntClass> <envStack> EnvStack </envStack> </crntObj> </control>
We want to extend the semantics of spawn
to also share the
current object environment with the child thread, in addition to the
current environment. This extension will allow us to also use method
invocations in the spawned statements, which will be thus looked up as
expected, using dynamic method dispatch. This lookup operation would
fail if the child thread did not have access to its parent's object
environment.
rule <thread>... <k> spawn S => !T:Int ...</k> <env> Env </env> <crntObj> Obj </crntObj> ...</thread> (.Bag => <thread>... <k> S </k> <env> Env </env> <id> !T </id> <crntObj> Obj </crntObj> ...</thread>)
Initially, the classes forming the program are moved into their
corresponding cells:
rule <k> class Class1 extends Class2 { S } => .K ...</k> <classes>... (.Bag => <classData> <className> Class1 </className> <baseClass> Class2 </baseClass> <declarations> S </declarations> </classData>) ...</classes>
Like in SIMPLE, method names are added to the environment and bound
to their code. However, unlike in SIMPLE where each function was
executed in the same environment, namely the program global
environment, a method in KOOL needs to be executed into its object's
environment. Thus, methods evaluate to closures, which encapsulate
their object's context (i.e., the current class and environment stack
of the object) in addition to method's parameters and body. This
approach to bind method names to method closures in the environment
will also allow objects to pass their methods to other objects, to
dynamically change their methods by assigning them other method
closures, and even to allow all these to be done from other objects.
This gives the KOOL programmer a lot of power; one should use this
power wisely, though, because programs can become easily hard to
understand and reason about if one overuses these features.
rule <k> method F:Id(Xs:Ids) S => .K ...</k> <crntClass> Class:Id </crntClass> <location> OL:Int </location> <env> Env => Env[F <- L] </env> <store>... .Map => L |-> methodClosure(Class,OL,Xs,S) ...</store> <nextLoc> L => L +Int 1 </nextLoc>
The semantics of new
consists of two actions: memory
allocation for the new object and execution of the corresponding
constructor. Then the created object is returned as the result of the
new
operation; the value returned by the constructor, if any,
is discarded. The current environment and object are stored onto the
stack and recovered after new (according to the semantics of
return
borrowed from SIMPLE, when the statement
return this;
in the rule below is reached and evaluated),
because the object creation part of new
will destroy them.
The rule below also initializes the object creation process by
emptying the local environment and the current object, and allocating
a location in the store where the created object will be eventually
stored (this is what the storeObj
task after the object
creation task in the rule below will do—its rule is defined
shortly). The location where the object will be stored is also made
available in the crntObj
cell, so that method closures can
refer to it (see rule above).
syntax KItem ::= "envStackFrame" "(" Id "," Map ")" rule <k> new Class:Id(Vs:Vals) ~> K => create(Class) ~> storeObj ~> Class(Vs); return this; </k> <env> Env => .Map </env> <nextLoc> L:Int => L +Int 1 </nextLoc> //<br/> // TODO(KORE): support latex annotations #1799 <control> <xstack> XS </xstack> <crntObj> Obj => <crntClass> Object </crntClass> <envStack> ListItem(envStackFrame(Object, .Map)) </envStack> <location> L </location> </crntObj> <fstack> .List => ListItem(fstackFrame(Env, K, XS, <crntObj> Obj </crntObj>)) ...</fstack> </control>
The creation of a new object (the memory allocation part only) is
a recursive process, requiring to first create an object for the
superclass. A memory object representation is a layered structure:
for each class on the path from the instance class to the root of the
hierarchy there is a layer including the memory allocated for the
members (both fields and methods) of that class.
syntax KItem ::= create(Id) rule <k> create(Class:Id) => create(Class1) ~> setCrntClass(Class) ~> S ~> addEnvLayer ...</k> <className> Class </className> <baseClass> Class1:Id </baseClass> <declarations> S </declarations> rule <k> create(Object) => .K ...</k>
The next operation sets the current class of the current object.
This is necessary to be done at each layer, because the current class
of the object is enclosed as part of the method closures (see the
semantics of method declarations above).
syntax KItem ::= setCrntClass(Id) rule <k> setCrntClass(C) => .K ...</k> <crntClass> _ => C </crntClass>
The next operation adds a new tagged environment layer to the
current object and gets ready for the next layer by clearing the
environment (note that create
expects the environment to be
empty).
syntax KItem ::= "addEnvLayer" rule <k> addEnvLayer => .K ...</k> <env> Env => .Map </env> <crntClass> Class:Id </crntClass> <envStack> .List => ListItem(envStackFrame(Class, Env)) ...</envStack>
The following operation stores the created object at the location
reserved by new
. Note that the location reserved by
new
was temporarily stored in the crntObj
cell
precisely for this purpose. Now that the newly created object is
stored at its location and that all method closures are aware of it,
the location is unnecessary and thus we delete it from the
crntObj
cell.
syntax KItem ::= "storeObj" rule <k> storeObj => .K ...</k> <crntObj> <crntClass> CC </crntClass> <envStack> ES </envStack> (<location> L:Int </location> => .Bag) </crntObj> <store>... .Map => L |-> objectClosure(CC, ES) ...</store>
The semantics of this
is straightforward: evaluate to the
current object.
rule <k> this => objectClosure(CC, ES) ...</k> <crntObj> <crntClass> CC </crntClass> <envStack> ES </envStack> </crntObj>
We can access an object member (field or method) either explicitly,
using the construct e.x
, or implicitly, using only the member
name x
directly. The borrowed semantics of SIMPLE will
already lookup a sole name in the local environment. The first rule
below reduces implicit member access to explicit access when the name
cannot be found in the local environment. There are two cases to
analyze for explicit object member access, depending upon whether the
object is a proper object or it is just a redirection to the parent
class via the construct super
. In the first case, we
evaluate the object expression and lookup the member starting with the
current class (static scoping). Note the use of the conditional
evaluation context. In the second case, we just lookup the member
starting with the superclass of the current class. In both cases,
the lookupMember
task eventually yields a lookup(L)
task for some appropriate location L
, which will be further
solved with the corresponding rule borrowed from SIMPLE. Note that the
current object is not altered by super
, so future method
invocations see the entire object, as needed for dynamic method dispatch.
rule <k> X:Id => this . X ...</k> <env> Env:Map </env> requires notBool(X in keys(Env)) context HOLE._::Id requires (HOLE =/=K super) // TODO: explain how Assoc matching has been replaced with two rules here. // Maybe also improve it a bit. /* rule objectClosure(<crntClass> Class:Id </crntClass> <envStack>... envStackFrame(Class,EnvC) EStack </envStack>) . X:Id => lookupMember(envStackFrame(Class,EnvC) EStack, X) */ rule objectClosure(Class:Id, ListItem(envStackFrame(Class,Env)) EStack) . X:Id => lookupMember(ListItem(envStackFrame(Class,Env)) EStack, X) rule objectClosure(Class:Id, (ListItem(envStackFrame(Class':Id,_)) => .List) _) . _X:Id requires Class =/=K Class' /* rule <k> super . X => lookupMember(EStack, X) ...</k> <crntClass> Class </crntClass> <envStack>... envStackFrame(Class,EnvC) EStack </envStack> */ rule <k> super . X => lookupMember(EStack, X) ...</k> <crntClass> Class:Id </crntClass> <envStack> ListItem(envStackFrame(Class,_)) EStack </envStack> rule <k> super . _X ...</k> <crntClass> Class </crntClass> <envStack> ListItem(envStackFrame(Class':Id,_)) => .List ...</envStack> requires Class =/=K Class'
Unlike in SIMPLE, in KOOL application was declared strict only in its
second argument. That is because we want to ensure dynamic method
dispatch when the first argument is a method access. As a
consequence, we need to consider all the cases of interest for the
first argument and to explicitly say what to do in each case. In all
cases except for method access in a proper object (i.e., not
super
), we want the same behavior for the first argument as
if it was not in a method invocation position. When it is a member
access (the third rule below), we look it up starting with the
instance class of the corresponding object. This ensures dynamic
dispatch for methods; it actually dynamically dispatches field
accesses, too, which is correct in KOOL, because one can assign method
closures to fields and the field appeared in a method invocation
context. The last context declaration below says that method
applications or array accesses are also allowed as first argument to
applications; that is because methods are allowed to return methods
and arrays are allowed to hold methods in KOOL, since it is
higher-order. If that is the case, then we want to evaluate the
method call or the array access.
rule <k> (X:Id => V)(_:Exps) ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store> rule <k> (X:Id => this . X)(_:Exps) ...</k> <env> Env </env> requires notBool(X in keys(Env)) context HOLE._::Id(_) requires HOLE =/=K super rule (objectClosure(_, EStack) . X => lookupMember(EStack, X:Id))(_:Exps) /* rule <k> (super . X => lookupMember(EStack,X))(_:Exps)...</k> <crntClass> Class </crntClass> <envStack>... envStackFrame(Class,_) EStack </envStack> */ rule <k> (super . X => lookupMember(EStack,X))(_:Exps)...</k> <crntClass> Class </crntClass> <envStack> ListItem(envStackFrame(Class,_)) EStack </envStack> rule <k> (super . _X)(_:Exps) ...</k> <crntClass> Class </crntClass> <envStack> ListItem(envStackFrame(Class':Id,_)) => .List ...</envStack> requires Class =/=K Class' // TODO(KORE): fix getKLabel #1801 rule (A:Exp(B:Exps))(C:Exps) => A(B) ~> #freezerFunCall(C) rule (A:Exp[B:Exps])(C:Exps) => A[B] ~> #freezerFunCall(C) rule V:Val ~> #freezerFunCall(C:Exps) => V(C) syntax KItem ::= "#freezerFunCall" "(" K ")" /* context HOLE(_:Exps) when getKLabel(HOLE) ==K #klabel(`_(_)`) orBool getKLabel(HOLE) ==K #klabel(`_[_]`) */
Eventually, each of the rules above produces a lookup(L)
task as a replacement for the method. When that happens, we just
lookup the value at location L
:
rule <k> (lookup(L) => V)(_:Exps) ...</k> <store>... L |-> V:Val ...</store>
The value V
looked up above is expected to be a method closure,
in which case the semantics of method application given above will
apply. Otherwise, the execution will get stuck.
It searches the object environment for a layer corresponding to the
desired class. It returns true
iff it can find the class,
otherwise it returns false
; it only gets stuck when its first
argument does not evaluate to an object.
rule objectClosure(_, ListItem(envStackFrame(C,_)) _) instanceOf C => true rule objectClosure(_, (ListItem(envStackFrame(C,_)) => .List) _) instanceOf C' requires C =/=K C' //TODO: remove the sort cast ::Id of C above, when sort inference bug fixed rule objectClosure(_, .List) instanceOf _ => false
In untyped KOOL, we prefer to not check the validity of casting. In
other words, any cast is allowed on any object, simply changing the
current class of the object to the desired class. The execution will
get stuck later if one attempts to access a field which is not
available. Moreover, the execution may complete successfully even
in the presence of invalid casts, provided that each accessed member
during the current execution is, or happens to be, available.
rule (C) objectClosure(_ , EnvStack) => objectClosure(C ,EnvStack)
Here we define all the auxiliary constructs used in the above
KOOL-specific semantics (those used in the SIMPLE fragment
have already been defined in a corresponding section above).
The current machinery borrowed with the semantics of SIMPLE allows us
to enrich the set of lvalues, this way allowing new means to assign
values to locations. In KOOL, we want object member names to be
lvalues, so that we can assign values to them using the already
existing machinery. The first rule below ensures that the object is
always explicit, the evaluation context enforces the object to be
evaluated, and finally the second rule initiates the lookup for the
member's location based on the current class of the object.
rule <k> lvalue(X:Id => this . X) ...</k> <env> Env </env> requires notBool(X in keys(Env)) context lvalue((HOLE . _)::Exp) /* rule lvalue(objectClosure(<crntClass> C </crntClass> <envStack>... envStackFrame(C,EnvC) EStack </envStack>) . X => lookupMember(<envStack> envStackFrame(C,EnvC) EStack </envStack>, X)) */ rule lvalue(objectClosure(Class, ListItem(envStackFrame(Class,Env)) EStack) . X => lookupMember(ListItem(envStackFrame(Class,Env)) EStack, X)) rule lvalue(objectClosure(Class, (ListItem(envStackFrame(Class':Id,_)) => .List) _) . _X) requires Class =/=K Class'
It searches for the given member in the given environment stack,
starting with the most concrete class and going up in the hierarchy.
// TODO(KORE): clarify sort inferences #1803 syntax Exp ::= lookupMember(List, Id) [function] /* syntax KItem ::= lookupMember(EnvStackCell,Id) [function] */ // rule lookupMember(<envStack> envStackFrame(_, <env>... X|->L ...</env>) ...</envStack>, X) // => lookup(L) rule lookupMember(ListItem(envStackFrame(_, X|->L _)) _, X) => lookup(L) // rule lookupMember(<envStack> envStackFrame(_, <env> Env </env>) => .List ...</envStack>, X) // when notBool(X in keys(Env)) rule lookupMember(ListItem(envStackFrame(_, Env)) Rest, X) => lookupMember(Rest, X) requires notBool(X in keys(Env)) //TODO: beautify the above endmodule
Go to Lesson 2, KOOL typed dynamic.
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of the untyped KOOL language. KOOL
is aimed at being a pedagogical and research language that captures
the essence of the object-oriented programming paradigm. Its untyped
variant discussed here is simpler than the typed one, ignoring several
intricate aspects of types in the presence of objects. A program
consists of a set of class declarations. Each class can extend at
most one other class (KOOL is single-inheritance). A class can
declare a set of fields and a set of methods, all public and called
the class' members. Specifically, KOOL includes the
following features:
Class declarations, where a class may or may not explicitly
extend another class. In case a class does not explicitly extend
another class, then it is assumed that it extends the default top-most
and empty (i.e., no members) class called Object
. Each class
is required to declare precisely one homonymous method, called its
constructor. Each valid program should contain one class
named Main
, whose constructor, Main()
, takes no
arguments. The execution of a program consists of creating an object
instance of class Main
and invoking the constructor
Main()
on it, that is, of executing new Main();
.
All features of SIMPLE (see examples/simple/untyped
),
i.e., multidimensional arrays, function (here called "method")
abstractions with call-by-value parameter passing style and static
scoping, blocks with locals, input/output, parametric exceptions, and
concurrency via dynamic thread creation/termination and synchronization.
The only change in the syntax of SIMPLE when imported in KOOL is the
function declaration keyword, function
, which is changed into
method
. The exact same desugaring macros from SIMPLE are
also included in KOOL. We can think of KOOL's classes as embedding
SIMPLE programs (extended with OO constructs, as discussed next).
Object creation using the new C(e1,...,en)
expression construct. An object instance of class C
is first
created and then the constructor C(e1,...,en)
is implicitly
called on that object. KOOL only allows (and requires) one
constructor per class. The class constructor can be called either
implicitly during a new object creation for the class, or explicitly.
The superclass constructor is not implicitly invoked when a
class constructor is invoked; if you want to invoke the superclass
constructor from a subclass constructor then you have to do it
explicitly.
An expression construct this
, which evaluates to the
current object.
An expression construct super
, which is used (only) in
combination with member lookup (see next) to refer to a superclass
field or method.
A member lookup expression construct e.x
, where e
is an expression (either an expression expected to evaluate to an object
or the super
construct) and x
is a class member name,
that is, a field or a method name.
Expression constructs e instanceOf C
and
(C) e
, where e
is an expression expected
to evaluate to an object and C
a class name. The former
tells whether the class of e
is a subclass of C
,
that is, whether e
can be used as an instance of C
,
and the latter changes the class of e
to C
. These
operations always succeed: the former returns a Boolean value, while
the latter changes the current class of e
to C
regardless of whether it is safe to do so or not. The typed version
of KOOL will check the safety of casting by ensuring that the instance
class of the object is a subclass of C
. In untyped KOOL we
do not want to perform this check because we want to allow the
programmer maximum of flexibility: if one always accesses only
available members, then the program can execute successfully despite
the potentially unsafe cast.
There are some specific aspects of KOOL that need to be discussed.
First, KOOL is higher-order, allowing function abstractions to be
treated like any other values in the language. For example, if
m
is a method of object e
then e.m
evaluates to the corresponding function abstraction. The function
abstraction is in fact a closure, because in addition to the method
parameters and body it also encapsulates the object value (i.e., the
environment of the object together with its current class—see below)
that e
evaluates to. This way, function abstractions can be
invoked anywhere and have the capability to change the state of their
object. For example, if m
is a method of object e
which increments a field c
of e
when invoked, and if
getm
is another method of e
which simply returns
m
when invoked, then the double application
(e.getm())()
has the same effect as e.m()
, that is,
increments the counter c
of e
. Note that the
higher-order nature of KOOL was not originally planned; it came as a
natural consequence of evaluating methods to closures and we decided
to keep it. If you do not like it then do not use it.
Second, since all the fields and methods are public in KOOL and since
they can be redeclared in subclasses, it is not immediately clear how
to lookup the member x
when we write e.x
and
e
is different from super
. We distinguish two cases,
depending on whether e.x
occurs in a method invocation
context (i.e., e.x(...)
) or in a field context. KOOL has
dynamic method dispatch, so if e.x
is invoked as a method
then x
will be searched for starting with the instance class of
the object value to which e
evaluates. If e.x
occurs in a non-method-invocation context then x
will be
treated as a field (although it may hold a method closure due to the
higher-order nature of KOOL) and thus will be searched starting with
the current class of the object value of e
(which, because of
this
and casting, may be different from its instance class).
In order to achieve the above, each object value will consist of a
pair holding the current class of the object and an environment stack
with one layer for each class in the object's instance class hierarchy.
Third, although KOOL is dynamic method dispatch, its capabilities
described above are powerful enough to allow us to mimic static
method dispatch. For example, suppose that you want to invoke method
m()
statically. Then all you need to do is to declare a
local variable and bind it to m
, for example var staticm = m;
, and
then call staticm()
. This works because
staticm
is first bound to the method closure that m
evaluates to, and then looked up as any local variable when invoked.
We only enable the dynamic method dispatch when we have an object
member on an application position, e.g., m()
.
In what follows, we limit our comments to the new, KOOL-specific
aspects of the language. We refer the reader to the untyped SIMPLE
language for documentation on the the remaining features, because
those were all borrowed from SIMPLE.
module KOOL-UNTYPED-SYNTAX imports DOMAINS-SYNTAX
The syntax of KOOL extends that of SIMPLE with object-oriented
constructs. We removed from the K annotated syntax of SIMPLE two
constructs, namely the one for function declarations (because we want
to call them methods
now) and the one for function application
(because application is not strict in the first argument
anymore—needs to initiate dynamic method dispatch). The additional
syntax includes:
Object
, forfunction
keyword of SIMPLE into method
.Object
.strict(2)
.syntax Id ::= "Object" [token] | "Main" [token] syntax Stmt ::= "var" Exps ";" | "method" Id "(" Ids ")" Block // called "function" in SIMPLE | "class" Id Block // KOOL | "class" Id "extends" Id Block // KOOL syntax Exp ::= Int | Bool | String | Id | "this" // KOOL | "super" // KOOL | "(" Exp ")" [bracket] | "++" Exp | Exp "instanceOf" Id [strict(1)] // KOOL | "(" Id ")" Exp [strict(2)] // KOOL cast | "new" Id "(" Exps ")" [strict(2)] // KOOL | Exp "." Id // KOOL > Exp "[" Exps "]" [strict] > Exp "(" Exps ")" [strict(2)] // was strict in SIMPLE | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict(1), left] | Exp "||" Exp [strict(1), left] > "spawn" Block > Exp "=" Exp [strict(2), right] syntax Ids ::= List{Id,","} syntax Exps ::= List{Exp,","} [strict, overload(exps)] syntax Val syntax Vals ::= List{Val,","} [overload(exps)] syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict(1)] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "return" Exp ";" [strict] | "return" ";" [macro] | "print" "(" Exps ")" ";" [strict] | "try" Block "catch" "(" Id ")" Block | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict] syntax Stmt ::= Stmt Stmt [right]
Old desugaring rules, from SIMPLE
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S} => {Start while (Cond) {S Step;}} rule var E1::Exp, E2::Exp, Es::Exps; => var E1; var E2, Es; [anywhere] rule var X::Id = E; => var X; X = E; [anywhere]
New desugaring rule
rule class C:Id S => class C extends Object S // KOOL endmodule
We first discuss the new configuration of KOOL, which extends that of
SIMPLE. Then we include the semantics of the constructs borrowed from
SIMPLE unchanged; we refrain from discussing those, because they were
already discussed in the K definition of SIMPLE. Then we discuss
changes to SIMPLE's semantics needed for the more general meaning of
the previous SIMPLE constructs (for example for thread spawning,
assignment, etc.). Finally, we discuss in detail the
semantics of the additional KOOL constructs.
module KOOL-UNTYPED imports KOOL-UNTYPED-SYNTAX imports DOMAINS
KOOL removes one cell and adds two nested cells to the configuration
of SIMPLE. The cell which is removed is the one holding the global
environment, because a KOOL program consists of a set of classes only,
with no global declarations. In fact, since informally speaking each
KOOL class now includes a SIMPLE program, it is safe to say that the
global variables in SIMPLE became class fields in KOOL. Let us now
discuss the new cells that are added to the configuration of SIMPLE.
The cell crntObj
holds data pertaining to the current
object, that is, the object environment in which the code in cell
k
executes: crntClass
holds the current class (which
can change as methods of the current object are invoked);
envStack
holds the stack of environments as a list,
each layer corresponding to one class in the objects' instance class
hierarchy; location
, which is optional, holds the location in
the store where the current object is or has to be located (this is
useful both for method closures and for the semantics of object
creation).
The cell classes
holds all the declared classes, each
class being held in its own class
cell which contains a name
(className
), a parent (extends
), and the actual
member declarations (declarations
).
// the syntax declarations below are required because the sorts are // referenced directly by a production and, because of the way KIL to KORE // is implemented, the configuration syntax is not available yet // should simply work once KIL is removed completely // check other definitions for this hack as well syntax EnvCell syntax ControlCell syntax EnvStackCell syntax CrntObjCellFragment configuration <T color="red"> <threads color="orange"> <thread multiplicity="*" type="Set" color="yellow"> <k color="green"> $PGM:Stmt ~> execute </k> //<br/> // TODO(KORE): support latex annotations #1799 <control color="cyan"> <fstack color="blue"> .List </fstack> <xstack color="purple"> .List </xstack> //<br/> // TODO(KORE): support latex annotations #1799 <crntObj color="Fuchsia"> // KOOL <crntClass> Object </crntClass> <envStack> .List </envStack> <location multiplicity="?"> .K </location> </crntObj> </control> //<br/> // TODO(KORE): support latex annotations #1799 <env color="violet"> .Map </env> <holds color="black"> .Map </holds> <id color="pink"> 0 </id> </thread> </threads> //<br/> // TODO(KORE): support latex annotations #1799 <store color="white"> .Map </store> <busy color="cyan">.Set </busy> <terminated color="red"> .Set </terminated> <input color="magenta" stream="stdin"> .List </input> <output color="brown" stream="stdout"> .List </output> <nextLoc color="gray"> 0 </nextLoc> //<br/> // TODO(KORE): support latex annotations #1799 <classes color="Fuchsia"> // KOOL <classData multiplicity="*" type="Map" color="Fuchsia"> // the Map has as its key the first child of the cell, // in this case the className cell. <className color="Fuchsia"> Main </className> <baseClass color="Fuchsia"> Object </baseClass> <declarations color="Fuchsia"> .K </declarations> </classData> </classes> </T>
The semantics below is taken over from SIMPLE unchanged.
The semantics of function declaration and invocation, including the
use of the special lambda
abstraction value, needs to change
in order to account for the fact that methods are now invoked into
their object's environment. The semantics of function return actually
stays unchanged. Also, the semantics of program initialization is
different: now we have to create an instance of the Main
class which also calls the constructor Main()
, while in
SIMPLE we only had to invoke the function Main()
.
Finally, the semantics of thread spawning needs to change, too: the
parent thread needs to also share its object environment with the
spawned thread (in addition to its local environment, like in SIMPLE).
This is needed in order to be able to spawn method invokations under
dynamic method dispatch; for example, spawn { run(); }
will need to look up the method run()
in the newly created
thread, operation which will most likely fail unless the child thread
sees the object environment of the parent thread. Note that the
spawn
statement of KOOL is more permissive than the threads
of Java. In fact, the latter can be implemented in terms of our
spawn
—see the program threads.kool
for a sketch.
Below is a subset of the values of SIMPLE, which are also values
of KOOL. We will add other values later in the semantics, such as
object and method closures.
syntax Val ::= Int | Bool | String | array(Int,Int) syntax Exp ::= Val syntax Exps ::= Vals syntax KResult ::= Val syntax KResult ::= Vals
The semantics below are taken verbatim from the untyped SIMPLE
definition.
syntax KItem ::= "undefined" rule <k> var X:Id; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> undefined ...</store> <nextLoc> L:Int => L +Int 1 </nextLoc> context var _:Id[HOLE]; rule <k> var X:Id[N:Int]; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> array(L +Int 1, N) (L +Int 1) ... (L +Int N) |-> undefined ...</store> <nextLoc> L:Int => L +Int 1 +Int N </nextLoc> requires N >=Int 0 syntax Id ::= "$1" [token] | "$2" [token] rule var X:Id[N1:Int, N2:Int, Vs:Vals]; => var X[N1]; { var $1=X; for(var $2=0; $2 <= N1 - 1; ++$2) { var X[N2,Vs]; $1[$2] = X; } } rule <k> X:Id => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store> context ++(HOLE => lvalue(HOLE)) rule <k> ++loc(L) => I +Int 1 ...</k> <store>... L |-> (I:Int => I +Int 1) ...</store> rule I1 + I2 => I1 +Int I2 rule Str1 + Str2 => Str1 +String Str2 rule I1 - I2 => I1 -Int I2 rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2 rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2 rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs] [anywhere] rule array(L,_)[N:Int] => lookup(L +Int N) [anywhere] rule sizeOf(array(_,N)) => N
The semantics of function application needs to change into dynamic
method dispatch invocation, which is defined shortly. However,
interestingly, the semantics of return stays unchanged.
rule <k> return(V:Val); ~> _ => V ~> K </k> <control> <fstack> ListItem(fstackFrame(Env,K,XS,<crntObj> CO </crntObj>)) => .List ...</fstack> <xstack> _ => XS </xstack> <crntObj> _ => CO </crntObj> </control> <env> _ => Env </env> syntax Val ::= "nothing" rule return; => return nothing; rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input> context (HOLE => lvalue(HOLE)) = _ rule <k> loc(L) = V:Val => V ...</k> <store>... L |-> (_ => V) ...</store> rule {} => .K rule <k> { S } => S ~> setEnv(Env) ...</k> <env> Env </env> rule S1::Stmt S2::Stmt => S1 ~> S2 rule _:Val; => .K rule if ( true) S else _ => S rule if (false) _ else S => S rule while (E) S => if (E) {S while(E)S} rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output> rule print(.Vals); => .K syntax KItem ::= xstackFrame(Id,Stmt,K,Map,K) // TODO(KORE): drop the additional production once parsing issue #1842 is fixed | (Id,Stmt,K,Map,K) syntax KItem ::= "popx" rule <k> (try S1 catch(X) {S2} => S1 ~> popx) ~> K </k> <control> <xstack> .List => ListItem(xstackFrame(X, S2, K, Env, C)) ...</xstack> C </control> <env> Env </env> rule <k> popx => .K ...</k> <xstack> ListItem(_) => .List ...</xstack> rule <k> throw V:Val; ~> _ => { var X = V; S2 } ~> K </k> <control> <xstack> ListItem(xstackFrame(X, S2, K, Env, C)) => .List ...</xstack> (_ => C) </control> <env> _ => Env </env>
Thread spawning needs a new semantics, because we want the child
thread to also share the object environment with its parent. The new
semantics of thread spawning will be defined shortly. However,
interestingly, the other concurrency constructs keep their semantics
from SIMPLE unchanged.
// TODO(KORE): ..Bag should be . throughout this definition #1772 rule (<thread>... <k>.K</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag) /* rule (<thread>... <k>.</k> <holds>H</holds> <id>T</id> ...</thread> => .) */ <busy> Busy => Busy -Set keys(H) </busy> <terminated>... .Set => SetItem(T) ...</terminated> rule <k> join T:Int; => .K ...</k> <terminated>... SetItem(T) ...</terminated> rule <k> acquire V:Val; => .K ...</k> <holds>... .Map => V |-> 0 ...</holds> <busy> Busy (.Set => SetItem(V)) </busy> requires (notBool(V in Busy:Set)) rule <k> acquire V; => .K ...</k> <holds>... V:Val |-> (N:Int => N +Int 1) ...</holds> rule <k> release V:Val; => .K ...</k> <holds>... V |-> (N => N:Int -Int 1) ...</holds> requires N >Int 0 rule <k> release V; => .K ...</k> <holds>... V:Val |-> 0 => .Map ...</holds> <busy>... SetItem(V) => .Set ...</busy> rule <k> rendezvous V:Val; => .K ...</k> <k> rendezvous V; => .K ...</k>
syntax Stmt ::= mkDecls(Ids,Vals) [function] rule mkDecls((X:Id, Xs:Ids), (V:Val, Vs:Vals)) => var X=V; mkDecls(Xs,Vs) rule mkDecls(.Ids,.Vals) => {} // TODO(KORE): clarify sort inferences #1803 syntax Exp ::= lookup(Int) /* syntax KItem ::= lookup(Int) */ rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store> syntax KItem ::= setEnv(Map) rule <k> setEnv(Env) => .K ...</k> <env> _ => Env </env> rule (setEnv(_) => .K) ~> setEnv(_) // TODO: How can we make sure that the second rule above applies before the first one? // Probably we'll deal with this using strategies, eventually. syntax Exp ::= lvalue(K) syntax Val ::= loc(Int) rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env> context lvalue(_::Exp[HOLE::Exps]) context lvalue(HOLE::Exp[_::Exps]) rule lvalue(lookup(L:Int) => loc(L)) syntax Map ::= Int "..." Int "|->" K [function] rule N...M |-> _ => .Map requires N >Int M rule N...M |-> K => N |-> K (N +Int 1)...M |-> K requires N <=Int M
When we extend a language, sometimes we need to do more than just add
new language constructs and semantics for them. Sometimes we want to
also extend the semantics of existing language constructs, in order to
get more from them.
In SIMPLE, once all the global declarations were processed, the
function main()
was invoked. In KOOL, the global
declarations are classes, and their specific semantics is given
shortly; essentially, they are pre-processed one by one and added
into the class
cell structure in the configuration.
Once all the classes are processed, the computation item
execute
, which was placed right after the program in the
initial configuration, is reached. In SIMPLE, the program was
initialized by calling the method main()
. In KOOL, the
program is initialized by creating an object instance of class
Main
. This will also implicitly call the method
Main()
(the Main
class constructor). The emptiness
of the env
cell below is just a sanity check, to make sure
that the user has not declared anything but classes at the top level
of the program.
syntax KItem ::= "execute" rule <k> execute => new Main(.Exps); </k> <env> .Map </env>
The semantics of new
(defined below) requires the
execution of all the class' declarations (and also of its
superclasses').
Before we can define the semantics of method application (previously
called function application in SIMPLE), we need to add two more values
to the language, namely object and method closures:
syntax Val ::= objectClosure(Id, List) | methodClosure(Id,Int,Ids,Stmt)
An object value consists of an objectClosure
-wrapped bag
containing the current class of the object and the environment stack
of the object. The current class of an object will always be one of
the classes mapped to an environment in the environment stack of the
object. A method closure encapsulates the method's parameters and
code (last two arguments), as well as the object context in which the
method code should execute. This object context includes the current
class of the object (the first argument of methodClosure
) and
the object environment stack (located in the object stored at the
location specified as the second argument of methodClosure
).
KOOL has a complex mechanism to invoke methods, because it allows both
dynamic method dispatch and methods as first-class-citizen values (the
latter making it a higher-order language). The invocation mechanism
will be defined later. What is sufficient to know for now is that
the two arguments of the application construct eventually reduce to
values, the first being a method closure and the latter a list of
values. The semantics of the method closure application is then as
expected: the local environment and control are stacked, then we
switch to method closure's class and object environment and execute
the method body. The mkDecls
construct is the one that came
with the unchanged semantics of SIMPLE above.
syntax KItem ::= fstackFrame(Map,K,List,K) // TODO(KORE): drop the additional production once parsing issue #1842 is fixed | (Map,K,K) rule <k> methodClosure(Class,OL,Xs,S)(Vs:Vals) ~> K => mkDecls(Xs,Vs) S return; </k> <env> Env => .Map </env> <store>... OL |-> objectClosure(_, EnvStack)...</store> //<br/> // TODO(KORE): support latex annotations #1799 <control> <xstack> XS </xstack> <fstack> .List => ListItem(fstackFrame(Env, K, XS, <crntObj> Obj' </crntObj>)) ...</fstack> <crntObj> Obj' => <crntClass> Class </crntClass> <envStack> EnvStack </envStack> </crntObj> </control>
We want to extend the semantics of spawn
to also share the
current object environment with the child thread, in addition to the
current environment. This extension will allow us to also use method
invocations in the spawned statements, which will be thus looked up as
expected, using dynamic method dispatch. This lookup operation would
fail if the child thread did not have access to its parent's object
environment.
rule <thread>... <k> spawn S => !T:Int ...</k> <env> Env </env> <crntObj> Obj </crntObj> ...</thread> (.Bag => <thread>... <k> S </k> <env> Env </env> <id> !T </id> <crntObj> Obj </crntObj> ...</thread>)
Initially, the classes forming the program are moved into their
corresponding cells:
rule <k> class Class1 extends Class2 { S } => .K ...</k> <classes>... (.Bag => <classData> <className> Class1 </className> <baseClass> Class2 </baseClass> <declarations> S </declarations> </classData>) ...</classes>
Like in SIMPLE, method names are added to the environment and bound
to their code. However, unlike in SIMPLE where each function was
executed in the same environment, namely the program global
environment, a method in KOOL needs to be executed into its object's
environment. Thus, methods evaluate to closures, which encapsulate
their object's context (i.e., the current class and environment stack
of the object) in addition to method's parameters and body. This
approach to bind method names to method closures in the environment
will also allow objects to pass their methods to other objects, to
dynamically change their methods by assigning them other method
closures, and even to allow all these to be done from other objects.
This gives the KOOL programmer a lot of power; one should use this
power wisely, though, because programs can become easily hard to
understand and reason about if one overuses these features.
rule <k> method F:Id(Xs:Ids) S => .K ...</k> <crntClass> Class:Id </crntClass> <location> OL:Int </location> <env> Env => Env[F <- L] </env> <store>... .Map => L |-> methodClosure(Class,OL,Xs,S) ...</store> <nextLoc> L => L +Int 1 </nextLoc>
The semantics of new
consists of two actions: memory
allocation for the new object and execution of the corresponding
constructor. Then the created object is returned as the result of the
new
operation; the value returned by the constructor, if any,
is discarded. The current environment and object are stored onto the
stack and recovered after new (according to the semantics of
return
borrowed from SIMPLE, when the statement
return this;
in the rule below is reached and evaluated),
because the object creation part of new
will destroy them.
The rule below also initializes the object creation process by
emptying the local environment and the current object, and allocating
a location in the store where the created object will be eventually
stored (this is what the storeObj
task after the object
creation task in the rule below will do—its rule is defined
shortly). The location where the object will be stored is also made
available in the crntObj
cell, so that method closures can
refer to it (see rule above).
syntax KItem ::= "envStackFrame" "(" Id "," Map ")" rule <k> new Class:Id(Vs:Vals) ~> K => create(Class) ~> storeObj ~> Class(Vs); return this; </k> <env> Env => .Map </env> <nextLoc> L:Int => L +Int 1 </nextLoc> //<br/> // TODO(KORE): support latex annotations #1799 <control> <xstack> XS </xstack> <crntObj> Obj => <crntClass> Object </crntClass> <envStack> ListItem(envStackFrame(Object, .Map)) </envStack> <location> L </location> </crntObj> <fstack> .List => ListItem(fstackFrame(Env, K, XS, <crntObj> Obj </crntObj>)) ...</fstack> </control>
The creation of a new object (the memory allocation part only) is
a recursive process, requiring to first create an object for the
superclass. A memory object representation is a layered structure:
for each class on the path from the instance class to the root of the
hierarchy there is a layer including the memory allocated for the
members (both fields and methods) of that class.
syntax KItem ::= create(Id) rule <k> create(Class:Id) => create(Class1) ~> setCrntClass(Class) ~> S ~> addEnvLayer ...</k> <className> Class </className> <baseClass> Class1:Id </baseClass> <declarations> S </declarations> rule <k> create(Object) => .K ...</k>
The next operation sets the current class of the current object.
This is necessary to be done at each layer, because the current class
of the object is enclosed as part of the method closures (see the
semantics of method declarations above).
syntax KItem ::= setCrntClass(Id) rule <k> setCrntClass(C) => .K ...</k> <crntClass> _ => C </crntClass>
The next operation adds a new tagged environment layer to the
current object and gets ready for the next layer by clearing the
environment (note that create
expects the environment to be
empty).
syntax KItem ::= "addEnvLayer" rule <k> addEnvLayer => .K ...</k> <env> Env => .Map </env> <crntClass> Class:Id </crntClass> <envStack> .List => ListItem(envStackFrame(Class, Env)) ...</envStack>
The following operation stores the created object at the location
reserved by new
. Note that the location reserved by
new
was temporarily stored in the crntObj
cell
precisely for this purpose. Now that the newly created object is
stored at its location and that all method closures are aware of it,
the location is unnecessary and thus we delete it from the
crntObj
cell.
syntax KItem ::= "storeObj" rule <k> storeObj => .K ...</k> <crntObj> <crntClass> CC </crntClass> <envStack> ES </envStack> (<location> L:Int </location> => .Bag) </crntObj> <store>... .Map => L |-> objectClosure(CC, ES) ...</store>
The semantics of this
is straightforward: evaluate to the
current object.
rule <k> this => objectClosure(CC, ES) ...</k> <crntObj> <crntClass> CC </crntClass> <envStack> ES </envStack> </crntObj>
We can access an object member (field or method) either explicitly,
using the construct e.x
, or implicitly, using only the member
name x
directly. The borrowed semantics of SIMPLE will
already lookup a sole name in the local environment. The first rule
below reduces implicit member access to explicit access when the name
cannot be found in the local environment. There are two cases to
analyze for explicit object member access, depending upon whether the
object is a proper object or it is just a redirection to the parent
class via the construct super
. In the first case, we
evaluate the object expression and lookup the member starting with the
current class (static scoping). Note the use of the conditional
evaluation context. In the second case, we just lookup the member
starting with the superclass of the current class. In both cases,
the lookupMember
task eventually yields a lookup(L)
task for some appropriate location L
, which will be further
solved with the corresponding rule borrowed from SIMPLE. Note that the
current object is not altered by super
, so future method
invocations see the entire object, as needed for dynamic method dispatch.
rule <k> X:Id => this . X ...</k> <env> Env:Map </env> requires notBool(X in keys(Env)) context HOLE._::Id requires (HOLE =/=K super) // TODO: explain how Assoc matching has been replaced with two rules here. // Maybe also improve it a bit. /* rule objectClosure(<crntClass> Class:Id </crntClass> <envStack>... envStackFrame(Class,EnvC) EStack </envStack>) . X:Id => lookupMember(envStackFrame(Class,EnvC) EStack, X) */ rule objectClosure(Class:Id, ListItem(envStackFrame(Class,Env)) EStack) . X:Id => lookupMember(ListItem(envStackFrame(Class,Env)) EStack, X) rule objectClosure(Class:Id, (ListItem(envStackFrame(Class':Id,_)) => .List) _) . _X:Id requires Class =/=K Class' /* rule <k> super . X => lookupMember(EStack, X) ...</k> <crntClass> Class </crntClass> <envStack>... envStackFrame(Class,EnvC) EStack </envStack> */ rule <k> super . X => lookupMember(EStack, X) ...</k> <crntClass> Class:Id </crntClass> <envStack> ListItem(envStackFrame(Class,_)) EStack </envStack> rule <k> super . _X ...</k> <crntClass> Class </crntClass> <envStack> ListItem(envStackFrame(Class':Id,_)) => .List ...</envStack> requires Class =/=K Class'
Unlike in SIMPLE, in KOOL application was declared strict only in its
second argument. That is because we want to ensure dynamic method
dispatch when the first argument is a method access. As a
consequence, we need to consider all the cases of interest for the
first argument and to explicitly say what to do in each case. In all
cases except for method access in a proper object (i.e., not
super
), we want the same behavior for the first argument as
if it was not in a method invocation position. When it is a member
access (the third rule below), we look it up starting with the
instance class of the corresponding object. This ensures dynamic
dispatch for methods; it actually dynamically dispatches field
accesses, too, which is correct in KOOL, because one can assign method
closures to fields and the field appeared in a method invocation
context. The last context declaration below says that method
applications or array accesses are also allowed as first argument to
applications; that is because methods are allowed to return methods
and arrays are allowed to hold methods in KOOL, since it is
higher-order. If that is the case, then we want to evaluate the
method call or the array access.
rule <k> (X:Id => V)(_:Exps) ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store> rule <k> (X:Id => this . X)(_:Exps) ...</k> <env> Env </env> requires notBool(X in keys(Env)) context HOLE._::Id(_) requires HOLE =/=K super rule (objectClosure(_, EStack) . X => lookupMember(EStack, X:Id))(_:Exps) /* rule <k> (super . X => lookupMember(EStack,X))(_:Exps)...</k> <crntClass> Class </crntClass> <envStack>... envStackFrame(Class,_) EStack </envStack> */ rule <k> (super . X => lookupMember(EStack,X))(_:Exps)...</k> <crntClass> Class </crntClass> <envStack> ListItem(envStackFrame(Class,_)) EStack </envStack> rule <k> (super . _X)(_:Exps) ...</k> <crntClass> Class </crntClass> <envStack> ListItem(envStackFrame(Class':Id,_)) => .List ...</envStack> requires Class =/=K Class' // TODO(KORE): fix getKLabel #1801 rule (A:Exp(B:Exps))(C:Exps) => A(B) ~> #freezerFunCall(C) rule (A:Exp[B:Exps])(C:Exps) => A[B] ~> #freezerFunCall(C) rule V:Val ~> #freezerFunCall(C:Exps) => V(C) syntax KItem ::= "#freezerFunCall" "(" K ")" /* context HOLE(_:Exps) when getKLabel(HOLE) ==K #klabel(`_(_)`) orBool getKLabel(HOLE) ==K #klabel(`_[_]`) */
Eventually, each of the rules above produces a lookup(L)
task as a replacement for the method. When that happens, we just
lookup the value at location L
:
rule <k> (lookup(L) => V)(_:Exps) ...</k> <store>... L |-> V:Val ...</store>
The value V
looked up above is expected to be a method closure,
in which case the semantics of method application given above will
apply. Otherwise, the execution will get stuck.
It searches the object environment for a layer corresponding to the
desired class. It returns true
iff it can find the class,
otherwise it returns false
; it only gets stuck when its first
argument does not evaluate to an object.
rule objectClosure(_, ListItem(envStackFrame(C,_)) _) instanceOf C => true rule objectClosure(_, (ListItem(envStackFrame(C,_)) => .List) _) instanceOf C' requires C =/=K C' //TODO: remove the sort cast ::Id of C above, when sort inference bug fixed rule objectClosure(_, .List) instanceOf _ => false
In untyped KOOL, we prefer to not check the validity of casting. In
other words, any cast is allowed on any object, simply changing the
current class of the object to the desired class. The execution will
get stuck later if one attempts to access a field which is not
available. Moreover, the execution may complete successfully even
in the presence of invalid casts, provided that each accessed member
during the current execution is, or happens to be, available.
rule (C) objectClosure(_ , EnvStack) => objectClosure(C ,EnvStack)
Here we define all the auxiliary constructs used in the above
KOOL-specific semantics (those used in the SIMPLE fragment
have already been defined in a corresponding section above).
The current machinery borrowed with the semantics of SIMPLE allows us
to enrich the set of lvalues, this way allowing new means to assign
values to locations. In KOOL, we want object member names to be
lvalues, so that we can assign values to them using the already
existing machinery. The first rule below ensures that the object is
always explicit, the evaluation context enforces the object to be
evaluated, and finally the second rule initiates the lookup for the
member's location based on the current class of the object.
rule <k> lvalue(X:Id => this . X) ...</k> <env> Env </env> requires notBool(X in keys(Env)) context lvalue((HOLE . _)::Exp) /* rule lvalue(objectClosure(<crntClass> C </crntClass> <envStack>... envStackFrame(C,EnvC) EStack </envStack>) . X => lookupMember(<envStack> envStackFrame(C,EnvC) EStack </envStack>, X)) */ rule lvalue(objectClosure(Class, ListItem(envStackFrame(Class,Env)) EStack) . X => lookupMember(ListItem(envStackFrame(Class,Env)) EStack, X)) rule lvalue(objectClosure(Class, (ListItem(envStackFrame(Class':Id,_)) => .List) _) . _X) requires Class =/=K Class'
It searches for the given member in the given environment stack,
starting with the most concrete class and going up in the hierarchy.
// TODO(KORE): clarify sort inferences #1803 syntax Exp ::= lookupMember(List, Id) [function] /* syntax KItem ::= lookupMember(EnvStackCell,Id) [function] */ // rule lookupMember(<envStack> envStackFrame(_, <env>... X|->L ...</env>) ...</envStack>, X) // => lookup(L) rule lookupMember(ListItem(envStackFrame(_, X|->L _)) _, X) => lookup(L) // rule lookupMember(<envStack> envStackFrame(_, <env> Env </env>) => .List ...</envStack>, X) // when notBool(X in keys(Env)) rule lookupMember(ListItem(envStackFrame(_, Env)) Rest, X) => lookupMember(Rest, X) requires notBool(X in keys(Env)) //TODO: beautify the above endmodule
Go to Lesson 2, KOOL typed dynamic.
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K dynamic semantics of the typed KOOL language. It is
very similar to the semantics of the untyped KOOL, the difference
being that we now check the typing policy dynamically. Since we have
to now declare the types of variables and methods, we adopt a syntax
for those which is close to Java. Like in the semantics of
untyped KOOL, where we borrowed almost all the semantics of untyped
SIMPLE, we are going to also borrow much of the semantics of
dynamically typed SIMPLE here. We will highlight the differences
between the dynamically typed and the untyped KOOL as we proceed with
the semantics. In general, the type policy of the typed KOOL language
is similar to that of Java. You may find it useful to also read
the discussion in the preamble of the static semantics of typed KOOL
before proceeding.
module KOOL-TYPED-DYNAMIC-SYNTAX imports DOMAINS-SYNTAX
Like for the untyped KOOL language, the syntax of typed KOOL extends
that of typed SIMPLE with object-oriented constructs.
The syntax below was produced by copying and modifying/extending the
syntax of dynamically typed SIMPLE. In fact, the only change we made
to the existing syntax of dynamically typed SIMPLE was to change the
strictness of the application construct like in untyped KOOL, from
strict
to strict(2)
(because application is not
strict in the first argument anymore due to dynamic method dispatch).
The KOOL-specific syntactic extensions are identical to those in
untyped KOOL.
syntax Id ::= "Object" [token] | "Main" [token]
syntax Type ::= "void" | "int" | "bool" | "string" | Id // KOOL class | Type "[" "]" | "(" Type ")" [bracket] > Types "->" Type // TODO(KORE): drop klabel once issues #1913 are fixed syntax Types ::= List{Type,","} [symbol(_,_::Types)] /* syntax Types ::= List{Type,","} */
syntax Param ::= Type Id syntax Params ::= List{Param,","} syntax Stmt ::= Type Exps ";" [avoid] | Type Id "(" Params ")" Block // stays like in typed SIMPLE | "class" Id Block // KOOL | "class" Id "extends" Id Block // KOOL
syntax Exp ::= Int | Bool | String | Id | "this" // KOOL | "super" // KOOL | "(" Exp ")" [bracket] | "++" Exp | Exp "instanceOf" Id [strict(1)] // KOOL | "(" Id ")" Exp [strict(2)] // KOOL cast | "new" Id "(" Exps ")" [strict(2)] // KOOL | Exp "." Id // KOOL > Exp "[" Exps "]" [strict] > Exp "(" Exps ")" [strict(2)] // was strict in SIMPLE | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict(1), left] | Exp "||" Exp [strict(1), left] > "spawn" Block > Exp "=" Exp [strict(2), right] syntax Exps ::= List{Exp,","} [strict, overload(exps)] syntax Val syntax Vals ::= List{Val,","} [overload(exps)]
syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict(1)] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "print" "(" Exps ")" ";" [strict] | "return" Exp ";" [strict] | "return" ";" | "try" Block "catch" "(" Param ")" Block | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict] syntax Stmt ::= Stmt Stmt [right]
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S::Stmt} => {Start while(Cond){S Step;}} rule T::Type E1::Exp, E2::Exp, Es::Exps; => T E1; T E2, Es; [anywhere] rule T::Type X::Id = E; => T X; X = E; [anywhere] rule class C:Id S => class C extends Object S // KOOL endmodule
We first discuss the new configuration, then we include the semantics of
the constructs borrowed from SIMPLE which stay unchanged, then those
whose semantics had to change, and finally the semantics of the
KOOL-specific constructs.
module KOOL-TYPED-DYNAMIC imports KOOL-TYPED-DYNAMIC-SYNTAX imports DOMAINS
The configuration of dynamically typed KOOL is almost identical to
that of its untyped variant. The only difference is the cell
return
, inside the control
cell, whose role is to
hold the expected return type of the invoked method. That is because
we want to dynamically check that the value that a method returns has
the expected type.
// the syntax declarations below are required because the sorts are // referenced directly by a production and, because of the way KIL to KORE // is implemented, the configuration syntax is not available yet // should simply work once KIL is removed completely // check other definitions for this hack as well syntax EnvCell syntax ControlCellFragment syntax EnvStackCell syntax CrntObjCellFragment configuration <T color="red"> <threads color="orange"> <thread multiplicity="*" type="Set" color="yellow"> <k color="green"> ($PGM:Stmt ~> execute) </k> //<br/> // TODO(KORE): support latex annotations #1799 <control color="cyan"> <fstack color="blue"> .List </fstack> <xstack color="purple"> .List </xstack> <returnType color="LimeGreen"> void </returnType> // KOOL //<br/> // TODO(KORE): support latex annotations #1799 <crntObj color="Fuchsia"> // KOOL <crntClass> Object </crntClass> <envStack> .List </envStack> <location multiplicity="?"> .K </location> </crntObj> </control> //<br/> // TODO(KORE): support latex annotations #1799 <env color="violet"> .Map </env> <holds color="black"> .Map </holds> <id color="pink"> 0 </id> </thread> </threads> //<br/> // TODO(KORE): support latex annotations #1799 <store color="white"> .Map </store> <busy color="cyan">.Set </busy> <terminated color="red"> .Set </terminated> <input color="magenta" stream="stdin"> .List </input> <output color="brown" stream="stdout"> .List </output> <nextLoc color="gray"> 0 </nextLoc> //<br/> // TODO(KORE): support latex annotations #1799 <classes color="Fuchsia"> // KOOL <classData multiplicity="*" type="Map" color="Fuchsia"> <className color="Fuchsia"> Main </className> <baseClass color="Fuchsia"> Object </baseClass> <declarations color="Fuchsia"> .K </declarations> </classData> </classes> </T>
The semantics below is taken over from dynamically typed SIMPLE
unchanged. Like for untyped KOOL, the semantics of function/method
declaration and invocation, and of program initialization needs to
change. Moreover, due to subtyping, the semantics of several imported
SIMPLE constructs can be made more general, such as that of the
return statement, that of the assignment, and that of the exceptions.
We removed all these from the imported semantics of SIMPLE below and
gave their modified semantics right after, together with the extended
semantics of thread spawning (which is identical to that of untyped
KOOL).
syntax Val ::= Int | Bool | String | array(Type,Int,Int) syntax Exp ::= Val syntax Exps ::= Vals syntax KResult ::= Val syntax KResult ::= Vals syntax KItem ::= undefined(Type) rule <k> T:Type X:Id; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> undefined(T) ...</store> <nextLoc> L:Int => L +Int 1 </nextLoc> rule <k> T:Type X:Id[N:Int]; => .K ...</k> <env> Env => Env[X <- L] </env> <store>... .Map => L |-> array(T, L +Int 1, N) (L +Int 1)...(L +Int N) |-> undefined(T) ...</store> <nextLoc> L:Int => L +Int 1 +Int N </nextLoc> requires N >=Int 0 context _:Type _::Exp[HOLE::Exps]; syntax Id ::= "$1" [token] | "$2" [token] rule T:Type X:Id[N1:Int, N2:Int, Vs:Vals]; => T[]<Vs> X[N1]; { T[][]<Vs> $1=X; for(int $2=0; $2 <= N1 - 1; ++$2) { T X[N2,Vs]; $1[$2] = X; } } rule <k> X:Id => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store> context ++(HOLE => lvalue(HOLE)) rule <k> ++loc(L) => I +Int 1 ...</k> <store>... L |-> (I:Int => I +Int 1) ...</store> rule I1 + I2 => I1 +Int I2 rule Str1 + Str2 => Str1 +String Str2 rule I1 - I2 => I1 -Int I2 rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2 rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2 rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E rule V:Val[N1:Int, N2:Int, Vs:Vals] => V[N1][N2, Vs] [anywhere] rule array(_:Type, L:Int, M:Int)[N:Int] => lookup(L +Int N) requires N >=Int 0 andBool N <Int M [anywhere] rule sizeOf(array(_,_,N)) => N syntax Val ::= nothing(Type) rule <k> return; => return nothing(T); ...</k> <returnType> T </returnType> rule <k> read() => I ...</k> <input> ListItem(I:Int) => .List ...</input> context (HOLE => lvalue(HOLE)) = _ rule {} => .K rule <k> { S } => S ~> setEnv(Env) ...</k> <env> Env </env> rule S1:Stmt S2:Stmt => S1 ~> S2 rule _:Val; => .K rule if ( true) S else _ => S rule if (false) _ else S => S rule while (E) S => if (E) {S while(E)S} rule <k> print(V:Val, Es => Es); ...</k> <output>... .List => ListItem(V) </output> requires typeOf(V) ==K int orBool typeOf(V) ==K string rule print(.Vals); => .K rule (<thread>... <k>.K</k> <holds>H</holds> <id>T</id> ...</thread> => .Bag) <busy> Busy => Busy -Set keys(H) </busy> <terminated>... .Set => SetItem(T) ...</terminated> rule <k> join T:Int; => .K ...</k> <terminated>... SetItem(T) ...</terminated> rule <k> acquire V:Val; => .K ...</k> <holds>... .Map => V |-> 0 ...</holds> <busy> Busy (.Set => SetItem(V)) </busy> requires (notBool(V in Busy:Set)) rule <k> acquire V; => .K ...</k> <holds>... V:Val |-> (N:Int => N +Int 1) ...</holds> rule <k> release V:Val; => .K ...</k> <holds>... V |-> (N => N:Int -Int 1) ...</holds> requires N >Int 0 rule <k> release V; => .K ...</k> <holds>... V:Val |-> 0 => .Map ...</holds> <busy>... SetItem(V) => .Set ...</busy> rule <k> rendezvous V:Val; => .K ...</k> <k> rendezvous V; => .K ...</k>
syntax Stmt ::= mkDecls(Params,Vals) [function] rule mkDecls((T:Type X:Id, Ps:Params), (V:Val, Vs:Vals)) => T X=V; mkDecls(Ps,Vs) rule mkDecls(.Params,.Vals) => {} syntax Exp ::= lookup(Int) rule <k> lookup(L) => V ...</k> <store>... L |-> V:Val ...</store> syntax KItem ::= setEnv(Map) rule <k> setEnv(Env) => .K ...</k> <env> _ => Env </env> rule (setEnv(_) => .K) ~> setEnv(_) syntax Exp ::= lvalue(K) syntax Val ::= loc(Int) rule <k> lvalue(X:Id => loc(L)) ...</k> <env>... X |-> L:Int ...</env> context lvalue(_::Exp[HOLE::Exps]) context lvalue(HOLE::Exp[_::Exps]) rule lvalue(lookup(L:Int) => loc(L)) syntax Type ::= Type "<" Vals ">" [function] rule T:Type<_,Vs:Vals> => T[]<Vs> rule T:Type<.Vals> => T syntax Map ::= Int "..." Int "|->" K [function] rule N...M |-> _ => .Map requires N >Int M rule N...M |-> K => N |-> K (N +Int 1)...M |-> K requires N <=Int M syntax Type ::= typeOf(K) [function] rule typeOf(_:Int) => int rule typeOf(_:Bool) => bool rule typeOf(_:String) => string rule typeOf(array(T,_,_)) => (T[]) rule typeOf(undefined(T)) => T rule typeOf(nothing(T)) => T syntax Types ::= getTypes(Params) [function] rule getTypes(T:Type _:Id) => T, .Types rule getTypes(T:Type _:Id, P, Ps) => T, getTypes(P,Ps) rule getTypes(.Params) => void, .Types
We extend/change the semantics of several SIMPLE constructs in order
to take advantage of the richer KOOL semantic infrastructure and thus
get more from the existing SIMPLE constructs.
Like in untyped KOOL.
syntax KItem ::= "execute" rule <k> execute => new Main(.Exps); </k> <env> .Map </env>
The only change to untyped KOOL's values is that method closures are
now typed (their first argument holds their type):
syntax Val ::= objectClosure(Id,List) | methodClosure(Type,Id,Int,Params,Stmt)
The type held by a method clossure will be the entire type of the
method, not only its result type like the lambda-closure of typed
SIMPLE. The reason for this change comes from the the need to
dynamically upcast values when passed to contexts where values of
superclass types are expected; since we want method closures to be
first-class-citizen values in our language, we have to be able to
dynamically upcast them, and in order to do that elegantly it is
convenient to store the entire ``current type'' of the method closure
instead of just its result type. Note that this was unnecessary in
the semantics of the dynamically typed SIMPLE language.
Method closure application needs to also set a new return type in
the return
cell, like in dynamically typed SIMPLE, in order
for the values returned by its body to be checked against the return
type of the method. To do this correctly, we also need to stack the
current status of the return
cell and then pop it when the
method returns. We have to do the same with the current object
environment, so we group them together in the stack frame.
syntax KItem ::= fstackFrame(Map, K, List, Type, K) rule <k> methodClosure(_->T,Class,OL,Ps,S)(Vs:Vals) ~> K => mkDecls(Ps,Vs) S return; </k> <env> Env => .Map </env> <store>... OL |-> objectClosure(_, EStack)...</store> //<br/> // TODO(KORE): support latex annotations #1799 <control> <fstack> .List => ListItem(fstackFrame(Env, K, XS, T', <crntObj> Obj' </crntObj>)) ...</fstack> <xstack> XS </xstack> <returnType> T' => T </returnType> <crntObj> Obj' => <crntClass> Class </crntClass> <envStack> EStack </envStack> </crntObj> </control>
At method return, we have to check that the type of the returned
value is a subtype of the expected return type. Moreover, if that is
the case, then we also upcast the returned value to one of the
expected type. The computation item unsafeCast(V,T)
changes
the typeof V
to T
without any additional checks; however, it only
does it when V
is an object or a method, otherwise it returns V
unchanged.
rule <k> return V:Val; ~> _ => subtype(typeOf(V), T) ~> true? ~> unsafeCast(V, T) ~> K </k> <control> <fstack> ListItem(fstackFrame(Env, K, XS, RT, <crntObj> CO </crntObj>)) => .List ...</fstack> <xstack> _ => XS </xstack> <returnType> T:Type => RT </returnType> <crntObj> _ => CO </crntObj> </control> <env> _ => Env </env>
Typed KOOL allows to assign subtype instance values to supertype
lvalues. The semantics of assignment below is similar in spirit to
dynamically typed SIMPLE's, but a check is performed that the assigned
value's type is a subtype of the location's type. If that is the
case, then the assigned value is returned as a result and stored, but
it is upcast appropriately first, so the context will continue to see
a value of the expected type of the location. Note that the type of a
location is implicit in the type of its contents and it never changes
during the execution of a program; its type is assigned when the
location is allocated and initialized, and then only type-preserving
values are allowed to be stored in each location.
rule <k> loc(L) = V:Val => subtype(typeOf(V),typeOf(V')) ~> true? ~> unsafeCast(V, typeOf(V')) ...</k> <store>... L |-> (V' => unsafeCast(V, typeOf(V'))) ...</store>
Exceptions are propagated now until a catch that can handle them is
encountered.
syntax KItem ::= xstackFrame(Param, Stmt, K, Map, K) syntax KItem ::= "popx" rule <k> (try S1 catch(P) S2 => S1 ~> popx) ~> K </k> <control> <xstack> .List => ListItem(xstackFrame(P, S2, K, Env, C)) ...</xstack> C </control> <env> Env </env> rule <k> popx => .K ...</k> <xstack> ListItem(_) => .List ...</xstack> rule <k> throw V:Val; ~> _ => if (subtype(typeOf(V),T)) { T X = V; S2 } else { throw V; } ~> K </k> <control> <xstack> ListItem(xstackFrame(T:Type X:Id, S2, K, Env, C)) => .List ...</xstack> (_ => C) </control> <env> _ => Env </env>
Like in untyped KOOL.
rule <thread>... <k> spawn S => !T:Int ...</k> <env> Env </env> <crntObj> Obj </crntObj> ...</thread> (.Bag => <thread>... <k> S </k> <env> Env </env> <id> !T </id> <crntObj> Obj </crntObj> ...</thread>)
Like in untyped KOOL.
rule <k> class Class1 extends Class2 { S } => .K ...</k> <classes>... (.Bag => <classData> <className> Class1 </className> <baseClass> Class2 </baseClass> <declarations> S </declarations> </classData>) ...</classes>
Methods are now typed and we need to store their types in their
closures, so that their type contract can be checked at invocation
time. The rule below is conceptually similar to that of untyped KOOL;
the only difference is the addition of the types.
rule <k> T:Type F:Id(Ps:Params) S => .K ...</k> <crntClass> C </crntClass> <location> OL </location> <env> Env => Env[F <- L] </env> <store>... .Map => L|->methodClosure(getTypes(Ps)->T,C,OL,Ps,S) ...</store> <nextLoc> L => L +Int 1 </nextLoc>
The semantics of new
in dynamically typed KOOL is also
similar to that in untyped KOOL, the main difference being the
management of the return types. Indeed, when a new object is created
we also have to stack the current type in the return
cell in
order to be recovered after the creation of the new object. Only the
first rule below needs to be changed; the others are identical to
those in untyped KOOL.
syntax KItem ::= envStackFrame(Id, Map) rule <k> new Class:Id(Vs:Vals) ~> K => create(Class) ~> (storeObj ~> ((Class(Vs)); return this;)) </k> <env> Env => .Map </env> <nextLoc> L:Int => L +Int 1 </nextLoc> //<br/> // TODO(KORE): support latex annotations #1799 <control> <xstack> XS </xstack> <crntObj> Obj => <crntClass> Object </crntClass> <envStack> ListItem(envStackFrame(Object, .Map)) </envStack> <location> L </location> </crntObj> <returnType> T => Class </returnType> <fstack> .List => ListItem(fstackFrame(Env, K, XS, T, <crntObj>Obj</crntObj>)) ...</fstack> </control> syntax KItem ::= create(Id) rule <k> create(Class:Id) => create(Class1) ~> setCrntClass(Class) ~> S ~> addEnvLayer ...</k> <className> Class </className> <baseClass> Class1:Id </baseClass> <declarations> S </declarations> rule <k> create(Object) => .K ...</k> syntax KItem ::= setCrntClass(Id) rule <k> setCrntClass(C) => .K ...</k> <crntClass> _ => C </crntClass> syntax KItem ::= "addEnvLayer" rule <k> addEnvLayer => .K ...</k> <env> Env => .Map </env> <crntClass> Class:Id </crntClass> <envStack> .List => ListItem(envStackFrame(Class, Env)) ...</envStack> syntax KItem ::= "storeObj" rule <k> storeObj => .K ...</k> <crntObj> <crntClass> Class </crntClass> <envStack> EStack </envStack> (<location> L:Int </location> => .Bag) </crntObj> <store>... .Map => L |-> objectClosure(Class, EStack) ...</store>
Like in untyped KOOL.
rule <k> this => objectClosure(Class, EStack) ...</k> <crntObj> <crntClass> Class </crntClass> <envStack> EStack </envStack> ... </crntObj>
Like in untyped KOOL.
rule <k> X:Id => this . X ...</k> <env> Env:Map </env> requires notBool(X in keys(Env)) context HOLE . _::Id requires (HOLE =/=K super) /* rule objectClosure(<crntObj> <crntClass> Class:Id </crntClass> <envStack>... ListItem((Class,EnvC:EnvCell)) EStack </envStack> </crntObj>) . X:Id => lookupMember(<envStack> ListItem((Class,EnvC)) EStack </envStack>, X) */ rule objectClosure(Class:Id, ListItem(envStackFrame(Class,Env)) EStack) . X:Id => lookupMember(ListItem(envStackFrame(Class,Env)) EStack, X) rule objectClosure(Class:Id, (ListItem(envStackFrame(Class':Id,_)) => .List) _EStack) . _X:Id requires Class =/=K Class' /* rule <k> super . X => lookupMember(<envStack>EStack</envStack>, X) ...</k> <crntClass> Class </crntClass> <envStack>... ListItem((Class,EnvC:EnvCell)) EStack </envStack> */ rule <k> super . X => lookupMember(EStack, X) ...</k> <crntClass> Class:Id </crntClass> <envStack> ListItem(envStackFrame(Class,_)) EStack </envStack> rule <k> super . _X ...</k> <crntClass> Class:Id </crntClass> <envStack> (ListItem(envStackFrame(Class':Id,_)) => .List) _EStack </envStack> requires Class =/=K Class'
The method lookup is the same as in untyped KOOL.
rule <k> (X:Id => V)(_:Exps) ...</k> <env>... X |-> L ...</env> <store>... L |-> V:Val ...</store> rule <k> (X:Id => this . X)(_:Exps) ...</k> <env> Env </env> requires notBool(X in keys(Env)) context HOLE._::Id(_) requires HOLE =/=K super rule (objectClosure(_, EStack) . X => lookupMember(EStack, X:Id))(_:Exps) /* rule <k> (super . X => lookupMember(<envStack>EStack</envStack>,X))(_:Exps)...</k> <crntClass> Class </crntClass> <envStack>... ListItem((Class,_)) EStack </envStack> */ rule <k> (super . X => lookupMember(EStack,X))(_:Exps)...</k> <crntClass> Class:Id </crntClass> <envStack> ListItem(envStackFrame(Class,_)) EStack </envStack> rule <k> (super . _X)(_:Exps)...</k> <crntClass> Class:Id </crntClass> <envStack> (ListItem(envStackFrame(Class':Id,_)) => .List) _EStack </envStack> requires Class =/=K Class' // TODO(KORE): fix getKLabel #1801 rule (A:Exp(B:Exps))(C:Exps) => A(B) ~> #freezerFunCall(C) rule (A:Exp[B:Exps])(C:Exps) => A[B] ~> #freezerFunCall(C) rule V:Val ~> #freezerFunCall(C:Exps) => V(C) syntax KItem ::= "#freezerFunCall" "(" K ")" /* context HOLE(_:Exps) requires getKLabel HOLE ==KLabel '_`(_`) orBool getKLabel HOLE ==KLabel '_`[_`] */ rule <k> (lookup(L) => V)(_:Exps) ...</k> <store>... L |-> V:Val ...</store>
Like in untyped KOOL.
rule objectClosure(_, ListItem(envStackFrame(C,_)) _) instanceOf C => true rule objectClosure(_, (ListItem(envStackFrame(C::Id,_)) => .List) _) instanceOf C' requires C =/=K C' rule objectClosure(_, .List) instanceOf _ => false
Unlike in untyped KOOL, in typed KOOL we actually check that the object
can indeed be cast to the claimed type.
rule (C:Id) objectClosure(Irrelevant, EStack) => objectClosure(Irrelevant, EStack) instanceOf C ~> true? ~> objectClosure(C, EStack)
Like in untyped KOOL.
rule <k> lvalue(X:Id => this . X) ...</k> <env> Env </env> requires notBool(X in keys(Env)) context lvalue((HOLE . _)::Exp) /* rule lvalue(objectClosure(<crntObj> <crntClass> C </crntClass> <envStack>... ListItem((C,EnvC:EnvCell)) EStack </envStack> </crntObj>) . X => lookupMember(<envStack> ListItem((C,EnvC)) EStack </envStack>, X)) */ rule lvalue(objectClosure(C:Id, ListItem(envStackFrame(C,Env)) EStack) . X => lookupMember(ListItem(envStackFrame(C,Env)) EStack, X)) rule lvalue(objectClosure(C, (ListItem(envStackFrame(C',_)) => .List) _EStack) . _X) requires C =/=K C'
Like in untyped KOOL.
syntax Exp ::= lookupMember(List,Id) [function] rule lookupMember(ListItem(envStackFrame(_, X |-> L _)) _, X) => lookup(L) // TODO: fix rule below as shown once we support functions with deep rewrites // rule lookupMember(<envStack> ListItem((_, <env> Env </env>)) => .List // ...</envStack>, X) // requires notBool(X in keys(Env)) rule lookupMember(ListItem(envStackFrame(_, Env)) L, X) => lookupMember(L, X) requires notBool(X in keys(Env))
typeOf
for the additional values}rule typeOf(objectClosure(C,_)) => C rule typeOf(methodClosure(T:Type,_,_,_Ps:Params,_)) => T
The subclass relation induces a subtyping relation.
syntax Exp ::= subtype(Types,Types) rule subtype(T:Type, T) => true rule <k> subtype(C1:Id, C:Id) => subtype(C2, C) ...</k> <className> C1 </className> <baseClass> C2:Id </baseClass> requires C1 =/=K C rule subtype(Object,Class:Id) => false requires Class =/=K Object rule subtype(Ts1->T2,Ts1'->T2') => subtype(((T2)::Type,Ts1'),((T2')::Type,Ts1)) // Note that the following rule would be wrong! // rule subtype(T[],T'[]) => subtype(T,T') rule subtype((T:Type,Ts),(T':Type,Ts')) => subtype(T,T') && subtype(Ts,Ts') requires Ts =/=K .Types rule subtype(.Types,.Types) => true
Performs unsafe casting. One should only use it in combination with
the subtype relation above.
syntax Val ::= unsafeCast(Val,Type) [function] rule unsafeCast(objectClosure(_,EStack), C:Id) => objectClosure(C,EStack) rule unsafeCast(methodClosure(_T',C,OL,Ps,S), T) => methodClosure(T,C,OL,Ps,S) rule unsafeCast(V:Val, T:Type) => V requires typeOf(V) ==K T
A generic computational guard: it allows the computation to continue
only if a prefix guard evaluates to true.
syntax KItem ::= "true?" rule true ~> true? => .K endmodule
Go to Lesson 3, KOOL typed static.
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K static semantics of the typed KOOL language.
It extends the static semantics of typed SIMPLE with static semantics
for the object-oriented constructs. Also, the static semantics of
some of the existing SIMPLE constructs need to change, in order to
become more generous with regards to the set of accepted programs,
mostly due to subtyping. For example, the assignment construct
x = e
required that both the variable x
and the
expression e
had the same type in SIMPLE. In KOOL, the type
of e
can be a subtype of the type of x
.
Specifically, we define the following typing policy for KOOL,
everything else not mentioned below borrowing its semantics from
SIMPLE:
Each class C
yields a homonymous type, which can be
explicitly used in programs to type variables and methods, possibly in
combination with other types.
Since now we have user-defined types, we check that each type
used in a KOOL program is well-formed, that is, it is constructed only
from primitive and class types corresponding to declared classes.
Class members and their types form a class type
environment. Each class will have such a type environment.
Each member in a class is allowed to be declared only once. Since in
KOOL we allow methods to be assigned to fields, we make no distinction
between field and method members; in other words, we reject programs
declaring both a field and a method with the same name.
If an identifier is not found in the local type environment, it
will be searched for in the current class type environment. If not
there, then it will be searched for in its superclass' type
environment. And so on and so forth. If not found until the
Object
class is reached, a typing error is reported.
The assignment allows variables to be assigned values of
more concrete types. The result type of the assignment expression
construct will be the (more abstract) type of the assigned variable,
and not the (more concrete) type of the expression, like in Java.
Exceptions are changed (from SIMPLE) to allow throwing and
catching only objects, like in Java. Also, unlike in SIMPLE, we do
not check whether the type of the thrown exception matches the type of
the caught variable, because exceptions can be caught by other
try/catch
blocks, even by ones in other methods. To avoid
having to annotate each method with what exceptions it can throw, we
prefer to not check the type safety of exceptions (although this is an
excellent homework!). We only check that the try
block
type-checks and that the catch
block type-checks after we bind
the caught variable to its claimed type.
Class declarations are not allowed to have any cycles in their
extends relation. Such cycles would lead to non-termination of
new
, as it actually does in the dynamic semantics of KOOL
where no such circularity checks are performed.
Methods overriding other methods should be in the right subtyping
relationship with the overridden methods: co-variant in the codomain
and contra-variant in the domain.
module KOOL-TYPED-STATIC-SYNTAX imports DOMAINS-SYNTAX
The syntax of statically typed KOOL is identical to that of
dynamically typed KOOL, they both taking as input the same programs.
What differs is the K strictness attributes. Like in statically
typed SIMPLE, almost all language constructs are strict now, since we
want each to type its arguments almost all the time. Like in the
other two KOOL definitions, we prefer to copy and then modify/extend
the syntax of statically typed SIMPLE.
Note: This paragraph is old, now we can do things better. We keep
it here only for historical reasons, to see how much we used to suffer 😃
Annoying K-tool technical problem:
Currently, the K tool treats the "non-terminal" productions (i.e.,
productions consisting of just one non-terminal), also called
"subsorting" production, differently from the other productions.
Specifically, it does not insert a node in the AST for them. This may
look desirable at first, but it has a big problem: it does not allow
us to treat the subsort differently in different context. For
example, since we want Id
to be both a type (a class name) and a
program variable, and since we want expressions to reduce to their
types, we are in an impossible situations in which we do not know how
to treat an identifier in the semantics: as a type, i.e., a result of
computations, or as a program variable, i.e., a non-result. Ideally,
we would like to tag the identifiers at parse-time with their local
interpretation, but that, unfortunately, is not possible with the
current parsing capabilities of the K tool, because it requires to
insert additional information in the AST for the subsort productions.
This will be fixed soon. Until then, unfortunately, we have to do the
job of the parser manually. Instead of subsorting Id
directly
to Type
, we "wrap" it first, say with a wrapper called
class(...)
, exactly how the parser should have done.
The major drawback of this is that all the typed KOOL programs
in kool/typed/programs
need to also be modified to always
declare class types accordingly. The modified programs can be found
in kool/typed/static/programs
. So make sure you execute the
static semantics of KOOL using the modified programs. To avoid seeing
the wrapper in the generated documentation, we associate it an
"invisibility" latex attribute below.
syntax Id ::= "Object" [token] | "Main" [token]
syntax Type ::= "void" | "int" | "bool" | "string" | Id [klabel("class"), symbol, avoid] // see next | Type "[" "]" | "(" Type ")" [bracket] > Types "->" Type syntax Types ::= List{Type,","} [overload(exps)]
syntax Param ::= Type Id syntax Params ::= List{Param,","} syntax Stmt ::= Type Exps ";" [avoid] | Type Id "(" Params ")" Block | "class" Id Block | "class" Id "extends" Id Block
syntax FieldReference ::= Exp "." Id [strict(1)] syntax ArrayReference ::= Exp "[" Exps "]" [strict] syntax Exp ::= Int | Bool | String | Id | "this" | "super" | "(" Exp ")" [bracket] | "++" Exp | Exp "instanceOf" Id [strict(1)] | "(" Id ")" Exp [strict(2)] | "new" Id "(" Exps ")" [strict(2)] > Exp "(" Exps ")" [strict] | "-" Exp [strict] | "sizeOf" "(" Exp ")" [strict] | "read" "(" ")" > left: Exp "*" Exp [strict, left] | Exp "/" Exp [strict, left] | Exp "%" Exp [strict, left] > left: Exp "+" Exp [strict, left] | Exp "-" Exp [strict, left] > non-assoc: Exp "<" Exp [strict, non-assoc] | Exp "<=" Exp [strict, non-assoc] | Exp ">" Exp [strict, non-assoc] | Exp ">=" Exp [strict, non-assoc] | Exp "==" Exp [strict, non-assoc] | Exp "!=" Exp [strict, non-assoc] > "!" Exp [strict] > left: Exp "&&" Exp [strict, left] | Exp "||" Exp [strict, left] > "spawn" Block // not strict: to check return and exceptions > Exp "=" Exp [strict(2), right] syntax Exp ::= FieldReference | ArrayReference syntax priority _.__KOOL-TYPED-STATIC-SYNTAX > _[_]_KOOL-TYPED-STATIC-SYNTAX > _(_)_KOOL-TYPED-STATIC-SYNTAX syntax Exps ::= List{Exp,","} [strict, overload(exps)]
syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Exp ";" [strict] | "if" "(" Exp ")" Block "else" Block [avoid, strict] | "if" "(" Exp ")" Block [macro] | "while" "(" Exp ")" Block [strict] | "for" "(" Stmt Exp ";" Exp ")" Block [macro] | "return" Exp ";" [strict] | "return" ";" | "print" "(" Exps ")" ";" [strict] | "try" Block "catch" "(" Param ")" Block [strict(1)] | "throw" Exp ";" [strict] | "join" Exp ";" [strict] | "acquire" Exp ";" [strict] | "release" Exp ";" [strict] | "rendezvous" Exp ";" [strict] syntax Stmt ::= Stmt Stmt [seqstrict, right]
rule if (E) S => if (E) S else {} rule for(Start Cond; Step) {S:Stmt} => {Start while(Cond){S Step;}} rule T:Type E1:Exp, E2:Exp, Es:Exps; => T E1; T E2, Es; [anywhere] rule T:Type X:Id = E; => T X; X = E; [anywhere] rule class C:Id S => class C extends Object S endmodule
We first discuss the configuration, then give the static semantics
taken over unchanged from SIMPLE, then discuss the static semantics of
SIMPLE syntactic constructs that needs to change, and in the end we
discuss the static semantics and additional checks specifically
related to the KOOL proper syntax.
module KOOL-TYPED-STATIC imports KOOL-TYPED-STATIC-SYNTAX imports DOMAINS
The configuration of our type system consists of a tasks
cell with the same meaning like in statically typed SIMPLE, of an
out
cell streamed to the standard output that will be used to
display typing error messages, and of a cell classes
holding
data about each class in a separate class
cell. The
task
cells now have two additional optional subcells, namely
ctenvT
and inClass
. The former holds a temporary
class type environment; its contents will be transferred into the
ctenv
cell of the corresponding class as soon as all the
fields and methods in the task are processed. In fact, there will be
three types of tasks in the subsequent semantics, each determined by
the subset of cells that it holds:
Main task, holding only a k
cell holding the
original program as a set of classes. The role of this task is to
process each class, generating a class task (see next) for each.
Class task, holding k
, ctenvT
, and
inClass
subcells. The role of this task type is to process
a class' contents, generating a class type environment in the
ctenvT
cell and a method task (see next) for each method in
the class. To avoid interference with object member lookup rules
below, it is important to add the class type environment to a class
atomically; this is the reason for which we use ctenvT
temporary cells within class tasks (instead of adding each member
incrementally to the class' type environment).
Method task, holding k
, tenv
and
return
cells. These tasks are similar to SIMPLE's function
tasks, so we do not discuss them here any further.
Each class
cell hods its name (in the className
cell) and the name of the class it extends (in the extends
cell), as well as its type environment (in the ctenv
cell)
and the set of all its superclasses (in the extendsAll
cell).
The later is useful for example for checking whether there are cycles
in the class extends relation.
configuration <T multiplicity="?" color="yellow"> <tasks color="orange" multiplicity="?"> <task multiplicity="*" color="yellow" type="Set"> <k color="green"> $PGM:Stmt </k> <tenv multiplicity="?" color="cyan"> .Map </tenv> <ctenvT multiplicity="?" color="blue"> .Map </ctenvT> <returnType multiplicity="?" color="black"> void </returnType> <inClass multiplicity="?" color="Fuchsia"> .K </inClass> </task> </tasks> // <br/> <classes color="Fuchsia"> <classData multiplicity="*" type="Map"> <className color="Fuchsia"> Object </className> <baseClass color="Fuchsia"> .K </baseClass> <baseClasses color="Fuchsia"> .Set </baseClasses> <ctenv multiplicity="?" color="blue"> .Map </ctenv> </classData> </classes> </T> <output color="brown" stream="stdout"> .List </output>
The syntax and rules below are borrowed unchanged from statically
typed SIMPLE, so we do not discuss them much here.
syntax Exp ::= Type syntax Exps ::= Types syntax BlockOrStmtType ::= "block" | "stmt" syntax Type ::= BlockOrStmtType syntax Block ::= BlockOrStmtType syntax KResult ::= Type | Types // TODO: should not be needed context _:Type _::Exp[HOLE::Exps]; rule T:Type E:Exp[int,Ts:Types]; => T[] E[Ts]; rule T:Type E:Exp[.Types]; => T E; rule <task>... <k> _:BlockOrStmtType </k> <tenv> _ </tenv> ...</task> => .Bag rule _:Int => int rule _:Bool => bool rule _:String => string rule <k> X:Id => T ...</k> <tenv>... X |-> T ...</tenv> context ++(HOLE => ltype(HOLE)) rule ++ int => int rule int + int => int rule string + string => string rule int - int => int rule int * int => int rule int / int => int rule int % int => int rule - int => int rule int < int => bool rule int <= int => bool rule int > int => bool rule int >= int => bool rule T:Type == T => bool rule T:Type != T => bool rule bool && bool => bool rule bool || bool => bool rule ! bool => bool rule (T[])[int, Ts:Types] => T[Ts] rule T:Type[.Types] => T rule sizeOf(_T[]) => int rule read() => int rule print(T:Type, Ts => Ts); requires T ==K int orBool T ==K string rule print(.Types); => stmt context (HOLE => ltype(HOLE)) = _ rule <k> return; => stmt ...</k> <returnType> _ </returnType> rule {} => block rule <task> <k> {S:Stmt} => block ...</k> <tenv> Rho </tenv> R </task> (.Bag => <task> <k> S </k> <tenv> Rho </tenv> R </task>) rule _:Type; => stmt rule if (bool) block else block => stmt rule while (bool) block => stmt rule join int; => stmt rule acquire _:Type; => stmt rule release _:Type; => stmt rule rendezvous _:Type; => stmt syntax Stmt ::= BlockOrStmtType rule _:BlockOrStmtType _:BlockOrStmtType => stmt
syntax Stmt ::= mkDecls(Params) [function] rule mkDecls(T:Type X:Id, Ps:Params) => T X; mkDecls(Ps) rule mkDecls(.Params) => {} syntax LValue ::= Id | FieldReference | ArrayReference syntax Exp ::= LValue syntax Exp ::= ltype(Exp) // We would like to say: // context ltype(HOLE:LValue) // but we currently cannot type the HOLE context ltype(HOLE) requires isLValue(HOLE) // OLD approach: // syntax Exp ::= ltype(Exp) [function] // rule ltype(X:Id) => X // rule ltype(E:Exp [Es:Exps]) => E[Es] syntax Types ::= getTypes(Params) [function] rule getTypes(T:Type _:Id) => T, .Types rule getTypes(T:Type _:Id, P, Ps) => T, getTypes(P,Ps) rule getTypes(.Params) => void, .Types
Below we give the new static semantics for language constructs that
come from SIMPLE, but whose SIMPLE static semantics was too
restrictive or too permissive and thus had to change.
Since we can define new types in KOOL (corresponding to classes), the
variable declaration needs to now check that the claimed types exist.
The operation checkType
, defined at the end of this module,
checks whether the argument type is correct (it actually works with
lists of types as well).
rule <k> T:Type X:Id; => checkType(T) ~> stmt ...</k> <tenv> Rho => Rho[X <- T] </tenv>
In class tasks, variable declarations mean class member declarations.
Since we reduce method declarations to variable declarations (see
below), a variable declaration in a class task can mean either a field
or a method declaration. Unlike local variable declarations, which
can shadow previous homonymous local or member declarations, member
declarations are regarded as a set, so we disallow multiple
declarations for the same member (one could improve upon this, like in
Java, by treating members with different types or number of arguments
as different, etc., but we do not do it here). We also issue an error
message if one attempts to redeclare the same class member. The
framed variable declaration in the second rule below should be read
"stuck". In fact, it is nothing but a unary operation called
stuck
, which takes a K-term as argument and does nothing
with it; this stuck
operation is displayed as a frame in this
PDF document because of its latex attribute (see the ASCII .k file,
at the end of this module).
rule <k> T:Type X:Id; => checkType(T) ~> stmt ...</k> <ctenvT> Rho (.Map => X |-> T) </ctenvT> requires notBool(X in keys(Rho)) rule <k> T:Type X:Id; => stuck(T X;) ...</k> <ctenvT>... X |-> _ ...</ctenvT> <inClass> C:Id </inClass> // <br/> <output>... .List => ListItem("Member \"" +String Id2String(X) +String "\" declared twice in class \"" +String Id2String(C) +String "\"!\n") </output>
A method declaration requires two conceptual checks to be performed:
first, that the method's type is consistent with the type of the
homonymous method that it overrides, if any; and second, that its body
types correctly. At the same time, it should also be added to the
type environment of its class. The first conceptual task is performed
using the checkMethod
operation defined below, and the second
by generating a corresponding method task. To add it to the class
type environment, we take advantage of the fact that KOOL is higher
order and reduce the problem to a field declaration problem, which we
have already defined. The role of the ctenvT
cell in the
rule below is to structurally ensure that the method declaration takes
place in a class task (we do not want to allow methods to be declared,
for example, inside other methods).
rule <k> T:Type F:Id(Ps:Params) S => checkMethod(F, getTypes(Ps)->T, C') ~> getTypes(Ps)->T F; ...</k> // <br/> <inClass> C </inClass> <ctenvT> _ </ctenvT> // to ensure we are in a class pass <className> C </className> <baseClass> C' </baseClass> // <br/> (.Bag => <task> <k> mkDecls(Ps) S </k> <inClass> C </inClass> <tenv> .Map </tenv> <returnType> T </returnType> </task>)
A more concrete value is allowed to be assigned to a more abstract
variable. The operation checkSubtype
is defined at the end
of the module and it also works with pairs of lists of types.
rule T:Type = T':Type => checkSubtype(T', T) ~> T
Methods can be applied on values of more concrete types than their
arguments:
rule (Ts:Types -> T:Type) (Ts':Types) => checkSubtype(Ts',Ts) ~> T
Similarly, we allow values of more concrete types to be returned by
methods:
rule <k> return T:Type; => checkSubtype(T,T') ~> stmt ...</k> <returnType> T':Type </returnType>
Exceptions can throw and catch values of any types. Since unlike in Java
KOOL's methods do not declare the exception types that they can throw,
we cannot test the full type safety of exceptions. Instead, we
only check that the try
and the catch
statements
type correctly.
rule try block catch(T:Type X:Id) S => {T X; S} rule throw _T:Type ; => stmt
The spawned cell needs to also be passed the parent's class.
// explain why rule <k> spawn S:Block => int ...</k> <tenv> Rho </tenv> <inClass> C </inClass> (.Bag => <task> <k> S </k> <tenv> Rho </tenv> <inClass> C </inClass> </task>)
We process each class in the main task, adding the corresponding data
into its class
cell and also adding a class task for it. We
also perform some well-formedness checks on the class hierarchy.
Initiate class processing
We create a class cell and a class task for each task. Also, we start
the class task with a check that the class it extends is declared
(this delays the task until that class is processed using another
instance of this rule).
// There seems to be some error with the configuration concretization, // as the rule below does not work when rewriting . to both the task // and the class cells; I had to include two separate . rewrites // TODO: the following fails krun; see #2117 rule <task> <k> class C:Id extends C':Id { S:Stmt } => stmt ...</k> </task> (.Bag => <classData>... <className> C </className> <baseClass> C' </baseClass> ...</classData>) // <br/> (.Bag => <task> <k> checkType(`class`(C')) ~> S </k> <inClass> C </inClass> <ctenvT> .Map </ctenvT> </task>) // You may want to try the thing below, but that failed, too /* syntax Type ::= "stmtStop" rule <tasks>... <task> <k> class C:Id extends C':Id { S:Stmt } => stmtStop ...</k> </task> (.Bag => <task> <k> checkType(`class`(C')) ~> S </k> <inClass> C </inClass> <ctenvT> .Map </ctenvT> </task>) ...</tasks> <classes>... .Bag => <classData>... <className> C </className> <baseClass> C' </baseClass> ...</classData> ...</classes> // <br/> */
rule (<T>... <className> C </className> <className> C </className> ...</T> => .Bag) <output>... .List => ListItem("Class \"" +String Id2String(C) +String "\" declared twice!\n") </output>
Check for cycles in class hierarchy
We check for cycles in the class hierarchy by transitively closing the
class extends relation using the extendsAll
cells, and
checking that a class will never appear in its own extendsAll
cell. The first rule below initiates the transitive closure of the
superclass relation, the second transitively closes it, and the third
checks for cycles.
rule <baseClass> C </baseClass> <baseClasses> .Set => SetItem(C) </baseClasses> [priority(25)] rule <classData>... <baseClasses> SetItem(C) Cs:Set (.Set => SetItem(C')) </baseClasses> ...</classData> <classData>... <className>C</className> <baseClass>C'</baseClass> ...</classData> requires notBool(C' in (SetItem(C) Cs)) [priority(25)] rule (<T>... <className> C </className> <baseClasses>... SetItem(C) ...</baseClasses> ...</T> => .Bag) <output>... .List => ListItem("Class \"" +String Id2String(C) +String "\" is in a cycle!\n") </output> [priority(25)]
To type new
we only need to check that the class constructor
can be called with arguments of the given types, so we initiate a call
to the constructor method in the corresponding class. If that
succeeds, meaning that it types to stmt
, then we discard the
stmt
type and produce instead the corresponding class type of
the new object. The auxiliary discard
operation is defined
also at the end of this module.
rule new C:Id(Ts:Types) => `class`(C) . C (Ts) ~> discard ~> `class`(C)
The typing rule for this
is straightforward: reduce to the
current class type.
rule <k> this => `class`(C) ...</k> <inClass> C:Id </inClass>
Similarly, super
types to the parent class type.
Note that for typing concerns, super can be considered as an object
(recall that this was not the case in the dynamic semantics).
rule <k> super => `class`(C') ...</k> <inClass> C:Id </inClass> <className> C </className> <baseClass> C':Id </baseClass>
There are several cases to consider here. First, if we are in a class
task, we should lookup the member into the temporary class type
environemnt in cell ctenvT
. That is because we want to allow
initialized field declarations in classes, such as int x=10;
.
This is desugared to a declaration of x
, which is added to
ctenvT
during the class task processing, followed by an
assignment of x
to 10. In order for the assignment to type
check, we need to know that x
has been declared with type
int
; this information can only be found in the
ctenvT
cell. Second, we should redirect non-local variable
lookups in method tasks to corresponding member accesses (the
local variables are handled by the rule borrowed from SIMPLE).
This is what the second rule below does. Third, we should allow
object member accesses as lvalues, which is done by the third rule
below. These last two rules therefore ensure that each necessary
object member access is explicitly allowed for evaluation. Recall
from the annotated syntax module above that the member access
operation is strict in the object. That means that the object is
expected to evaluate to a class type. The next two rules below define
the actual member lookup operation, moving the search to the
superclass when the member is not found in the current class. Note
that this works because we create the class type environments
atomically; thus, a class either has its complete type environment
available, in which case these rules can safely apply, or its cell
ctenv
is not yet available, in which case these rules have to
wait. Finally, the sixth rule below reports an error when the
Object
class is reached.
rule <k> X:Id => T ...</k> <ctenvT>... X |-> T ...</ctenvT> rule <k> X:Id => this . X ...</k> <tenv> Rho </tenv> requires notBool(X in keys(Rho)) // OLD approach: // rule ltype(E:Exp . X:Id) => E . X rule <k> `class`(C:Id) . X:Id => T ...</k> <className> C </className> <ctenv>... X |-> T:Type ...</ctenv> rule <k> `class`(C1:Id => C2) . X:Id ...</k> <className> C1 </className> <baseClass> C2:Id </baseClass> <ctenv> Rho </ctenv> requires notBool(X in keys(Rho)) rule <k> `class`(Object) . X:Id => stuck(`class`(Object) . X) ...</k> <inClass> C:Id </inClass> // <br/> <output>... .List => ListItem("Member \"" +String Id2String(X) +String "\" not declared! (see class \"" +String Id2String(C) +String "\")\n") </output>
As it is hard to check statically whether casting is always safe,
the programmer is simply trusted from a typing perspective. We only
do some basic upcasting and downcasting checks, to reject casts which
will absolutely fail. However, dynamic semantics or implementations
of the language need to insert runtime checks for downcasting to be safe.
rule `class`(_C1:Id) instanceOf _C2:Id => bool rule (C:Id) `class`(C) => `class`(C) rule <k> (C2:Id) `class`(C1:Id) => `class`(C2) ...</k> <className> C1 </className> <baseClasses>...SetItem(C2)...</baseClasses> // upcast rule <k> (C2:Id) `class`(C1:Id) => `class`(C2) ...</k> <className> C2 </className> <baseClasses>...SetItem(C1)...</baseClasses> // downcast rule <k> (C2) `class`(C1:Id) => stuck((C2) `class`(C1)) ...</k> <classData>... <className> C1 </className> <baseClasses> S1 </baseClasses> ...</classData> <classData>... <className> C2 </className> <baseClasses> S2 </baseClasses> ...</classData> <output>... .List => ListItem("Classes \"" +String Id2String(C1) +String "\" and \"" +String Id2String(C2) +String "\" are incompatible!\n") </output> requires notBool(C1 in S2) andBool notBool(C2 in S1)
Finally, we need to clean up the terminated tasks. Each of the three
types of tasks is handled differently. The main task is replaced by a
method task holding new main();
, which will ensure that a
main
class with a main()
method actually exists
(first rule below). A class task moves its temporary class type
environment into its class' cell, and then it dissolves itself (second
rule). A method task simply dissolves when terminated (third rule);
the presence of the tenv
cell in that rule ensures that that
task is a method task.
Finally, when all the tasks are cleaned up, we can also remove the
tasks
cell, issuing a corresponding message. Note that
checking for cycles or duplicate methods can still be performed after
the tasks
cell has been removed.
// discard main task when done, issuing a "new main();" command to // make sure that the class main and the method main() are declared. rule <task> <k> stmt => new Main(.Exps); </k> (.Bag => <tenv> .Map </tenv> <returnType> void </returnType> <inClass> Main </inClass>) </task> // discard class task when done, adding a ctenv in class rule (<task> <k> stmt </k> <ctenvT> Rho </ctenvT> <inClass> C:Id </inClass> </task> => .Bag) <className> C </className> (.Bag => <ctenv> Rho </ctenv>) // discard method task when done rule <task>... <k> stmt </k> <tenv> _ </tenv> // only to ensure that this is a method task ...</task> => .Bag // cleanup tasks and output a success message when done rule (<T>... <tasks> .Bag </tasks> ...</T> => .Bag) <output>... .List => ListItem("Type checked!\n") </output>
The subclass relation introduces a subtyping relation.
syntax KItem ::= checkSubtype(Types,Types) rule checkSubtype(T:Type, T) => .K rule <k> checkSubtype(`class`(C:Id), `class`(C':Id)) => .K ...</k> <className> C </className> <baseClasses>... SetItem(C') ...</baseClasses> rule checkSubtype(Ts1->T2,Ts1'->T2') => checkSubtype(((T2)::Type,Ts1'),((T2')::Type,Ts1)) // note that the following rule would be wrong! // rule checkSubtype(T[],T'[]) => checkSubtype(T,T') rule checkSubtype((T:Type,Ts),(T':Type,Ts')) => checkSubtype(T,T') ~> checkSubtype(Ts,Ts') requires Ts =/=K .Types rule checkSubtype(.Types,.Types) => .K rule checkSubtype(.Types,void) => .K
Since now any Id
can be used as the type of a class, we need to
check that the types used in the program actually exists
syntax KItem ::= checkType(Types) rule checkType(T:Type,Ts:Types) => checkType(T) ~> checkType(Ts) requires Ts =/=K .Types rule checkType(.Types) => .K rule checkType(int) => .K rule checkType(bool) => .K rule checkType(string) => .K rule checkType(void) => .K rule <k> checkType(`class`(C:Id)) => .K ...</k> <className> C </className> rule checkType(`class`(Object)) => .K rule checkType(Ts:Types -> T:Type) => checkType(T,Ts) rule checkType(T:Type[]) => checkType(T)
The checkMethod
operation below searches to see whether
the current method overrides some other method in some superclass.
If yes, then it issues an additional check that the new method's type
is more concrete than the overridden method's. The types T
and T'
below can only be function types. See the definition of
checkSubtype
on function types at the end of this module (it
is co-variant in the codomain and contra-variant in the domain).
syntax KItem ::= checkMethod(Id,Type,Id) rule <k> checkMethod(F:Id, T:Type, C:Id) => checkSubtype(T, T') ...</k> <className> C </className> <ctenv>... F |-> T':Type ...</ctenv> rule <k> checkMethod(F:Id, _T:Type, (C:Id => C')) ...</k> <className> C </className> <baseClass> C':Id </baseClass> <ctenv> Rho </ctenv> requires notBool(F in keys(Rho)) rule checkMethod(_:Id,_,Object) => .K
syntax KItem ::= stuck(K) syntax KItem ::= "discard" rule _:KResult ~> discard => .K endmodule
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of the untyped FUN language.
FUN is a pedagogical and research language that captures the essence
of the functional programming paradigm, extended with several features
often encountered in functional programming languages.
Like many functional languages, FUN is an expression language, that
is, everything, including the main program, is an expression.
Functions can be declared anywhere and are first class values in the
language.
FUN is call-by-value here, but it has been extended (as student
homework assignments) with other parameter-passing styles.
To make it more interesting and to highlight some of K's strengths,
FUN includes the following features:
The basic builtin data-types of integers, booleans and strings.
Builtin lists, which can hold any elements, including other lists.
Lists are enclosed in square brackets and their elements are
comma-separated; e.g., [1,2,3]
.
User-defined data-types, by means of constructor terms.
Constructor names start with a capital letter (while any other
identifier in the language starts with a lowercase letter), and they
can be followed by an arbitrary number of comma-separated arguments
enclosed in parentheses; parentheses are not needed when the
constructor takes no arguments.
For example, Pair(5,7)
is a constructor term holding two
numbers, Cons(1,Cons(2,Cons(3,Nil)))
is a list-like
constructor term holding 3 elements, and
Tree(Tree(Leaf(1), Leaf(2)), Leaf(3))
is a tree-like
constructor term holding 3 elements.
In the untyped version of the FUN language, no type checking or
inference is performed to ensure that the data constructors are used
correctly.
The execution will simply get stuck when they are misused.
Moreover, since no type checking is performed, the data-types are not
even declared in the untyped version of FUN.
Functions and let
/letrec
binders can take
multiple space-separated arguments, but these are desugared to
ones that only take one argument, by currying. For example, the
expressions
fun x y -> x y
let x y = y in x
are desugared, respectively, into the following expressions:
fun x -> fun y -> x y
let x = fun y -> y in x
Functions can be defined using pattern matching over the
available data-types. For example, the program
letrec max = fun [h] -> h
| [h|t] -> let x = max t
in if h > x then h else x
in max [1, 3, 5, 2, 4, 0, -1, -5]
defines a function max
that calculates the maximum element of
a non-empty list, and the function
letrec ack = fun Pair(0,n) -> n + 1
| Pair(m,0) -> ack Pair(m - 1, 1)
| Pair(m,n) -> ack Pair(m - 1, ack Pair(m, n - 1))
in ack Pair(2,3)
calculates the Ackermann function applied to a particular pair of numbers.
Patterns can be nested. Patterns can currently only be used in function
definitions, and not directly in let
/letrec
binders.
For example, this is not allowed:
letrec Pai(x,y) = Pair(1,2) in x+y
But this is allowed:
let f Pair(x,y) = x+y in f Pair(1,2)
because it is first reduced to
let f = fun Pair(x,y) -> x+y in f Pair(1,2)
by uncurrying of the let
binder, and pattern matching is
allowed in function arguments.
We include a callcc
construct, for two reasons: first,
several functional languages support this construct; second, some
semantic frameworks have difficulties defining it. Not K.
Finally, we include mutables by means of referencing an
expression, getting the reference of a variable, dereferencing and
assignment. We include these for the same reasons as above: there are
languages which have them, and they are not easy to define in some
semantic frameworks.
Like in many other languages, some of FUN's constructs can be
desugared into a smaller set of basic constructs. We do that as usual,
using macros, and then we only give semantics to the core constructs.
Note:
We recommend the reader to first consult the dynamic semantics of the
LAMBDA++ language in the first part of the K Tutorial.
To keep the comments below small and focused, we will not re-explain
functional or K features that have already been explained in there.
//require "modules/pattern-matching.k" module FUN-UNTYPED-COMMON imports DOMAINS-SYNTAX
FUN is an expression language. The constructs below fall into
several categories: names, arithmetic constructs, conventional
functional constructs, patterns and pattern matching, data constructs,
lists, references, and call-with-current-continuation (callcc).
The arithmetic constructs are standard; they are present in almost all
our K language definitions. The meaning of FUN's constructs are
discussed in more depth when we define their semantics in the next
module.
We start with the syntactic definition of FUN names.
We have several categories of names: ones to be used for functions and
variables, others to be used for data constructors, others for types and
others for type variables. We will introduce them as needed, starting
with the former category. We prefer the names of variables and functions
to start with lower case letters. We take the freedom to tacitly introduce
syntactic lists/sequences for each nonterminal for which we need them:
syntax Name [token] syntax Names ::= List{Name,","} [overload(exps)]
Expression constructs will be defined throughtout the syntax module.
Below are the very basic ones, namely the builtins, the names, and the
parentheses used as brackets for grouping. Lists of expressions are
declared strict, so all expressions in the list get evaluated whenever
the list is on a position which can be evaluated:
syntax Exp ::= Int | Bool | String | Name | "(" Exp ")" [bracket] syntax Exps ::= List{Exp,","} [strict, overload(exps)] syntax Val syntax Exp ::= Val syntax Exps ::= Vals syntax Vals ::= List{Val,","} [overload(exps)] syntax Bottom syntax Bottoms ::= List{Bottom,","} [overload(exps)]
We next define the syntax of arithmetic constructs, together with
their relative priorities and left-/non-associativities. We also
tag all these rules as members of a new group, "arith", so we can more easily
define global syntax priorities later (at the end of the syntax module).
syntax Exp ::= left: Exp "*" Exp [strict, group(arith)] | Exp "/" Exp [strict, group(arith)] | Exp "%" Exp [strict, group(arith)] > left: Exp "+" Exp [strict, left, group(arith)] | Exp "^" Exp [strict, left, group(arith)] // left attribute should not be necessary; currently a parsing bug | Exp "-" Exp [strict, prefer, group(arith)] // the "prefer" attribute above is to not parse x-1 as x(-1) // Due to some parsing problems, we currently cannot add unary minus: | "-" Exp [strict, group(arith)] > non-assoc: Exp "<" Exp [strict, group(arith)] | Exp "<=" Exp [strict, group(arith)] | Exp ">" Exp [strict, group(arith)] | Exp ">=" Exp [strict, group(arith)] | Exp "==" Exp [strict, group(arith)] | Exp "!=" Exp [strict, group(arith)] > "!" Exp [strict, group(arith)] > Exp "&&" Exp [strict(1), left, group(arith)] > Exp "||" Exp [strict(1), left, group(arith)]
The conditional construct has the expected evaluation strategy,
stating that only the first argument is evaluate:
syntax Exp ::= "if" Exp "then" Exp "else" Exp [strict(1)]
FUN's builtin lists are formed by enclosing comma-separated
sequences of expressions (i.e., terms of sort Exps
) in square
brackets. The list constructor cons
adds a new element to the
top of the list, head
and tail
get the first element
and the tail sublist of a list if they exist, respectively, and get
stuck otherwise, and null??
tests whether a list is empty or
not; syntactically, these are just expression constants.
In function patterns, we are also going to allow patterns following the
usual head/tail notation; for example, the pattern [x_1,...,x_n|t]
binds x_1
, ..., x_n
to the first elements of the matched list,
and t
to the list formed with the remaining elements. We define list
patterns as ordinary expression constructs, although we will make sure that
we do not give them semantics if they appear in any other place then in a
function case pattern.
syntax Exp ::= "[" Exps "]" [strict, klabel(list)] | "head" [macro] | "tail" [macro] | "null?" [macro] | "[" Exps "|" Exp "]" syntax Val ::= "[" Vals "]" [klabel(list)] syntax Cons ::= "cons" syntax Val ::= Cons syntax Val ::= Cons Val [klabel(apply)]
Data constructors start with capital letters and they may or may
not have arguments. We need to use the attribute "prefer" to make
sure that, e.g., Cons(a)
parses as constructor Cons
with
argument a
, and not as the expression Cons
(because
constructor names are also expressions) regarded as a function applied
to the expression a
. Also, note that the constructor is strict
in its second argument, because we want to evaluate its arguments but
not the constuctor name itsef.
syntax ConstructorName [token] syntax Exp ::= ConstructorName | ConstructorName "(" Exps ")" [prefer, strict(2), klabel(constructor)] syntax Val ::= ConstructorName "(" Vals ")" [klabel(constructor)]
A function is essentially a |
-separated ordered
sequence of cases, each case of the form pattern -> expression
,
preceded by the language construct fun
. Patterns will be defined
shortly, both for the builtin lists and for user-defined constructors.
Recall that the syntax we define in K is not meant to serve as a
ultimate parser for the defined language, but rather as a convenient
notation for K abstract syntax trees, which we prefer when we write
the semantic rules. It is therefore often the case that we define a
more ``generous'' syntax than we want to allow programs to use.
We do it here, too. Specifically, the syntax of Cases
below allows any expressions to appear as pattern. This syntactic
relaxation permits many wrong programs to be parsed, but that is not a
problem because we are not going to give semantics to wrong combinations,
so those programs will get stuck; moreover, our type inferencer will reject
those programs anyway. Function application is just concatenation of
expressions, without worrying about type correctness. Again, the type
system will reject type-incorrect programs.
syntax Exp ::= "fun" Cases | Exp Exp [strict, left, klabel(apply)] // NOTE: We would like eventually to also have Exp "(" Exps ") syntax Case ::= Exp "->" Exp syntax Cases ::= List{Case, "|"}
The let
and letrec
binders have the usual syntax
and functional meaning. We allow multiple and
-separated bindings.
Like for the function cases above, we allow a more generous syntax for
the left-hand sides of bindings, noting that the semantics will get stuck
on incorrect bindings and that the type system will reject those programs.
syntax Exp ::= "let" Bindings "in" Exp | "letrec" Bindings "in" Exp [prefer] // The "prefer" attribute for letrec currently needed due to tool bug, // to make sure that "letrec" is not parsed as "let rec". syntax Binding ::= Exp "=" Exp syntax Bindings ::= List{Binding,"and"}
References are first class values in FUN. The construct ref
takes an expression, evaluates it, and then it stores the resulting value
at a fresh location in the store and returns that reference. Syntactically,
ref
is just an expression constant. The construct &
takes a name as argument and evaluates to a reference, namely the store
reference where the variable passed as argument stores its value; this
construct is a bit controversial and is further discussed in the
environment-based semantics of the FUN language, where we desugar
ref
to it. The construct @
takes a reference
and evaluates to the value stored there. The construct :=
takes
two expressions, the first expected to evaluate to a reference; the value
of its second argument will be stored at the location to which the first
points (the old value is thus lost). Finally, since expression evaluation
now has side effects, it makes sense to also add a sequential composition
construct, which is sequentially strict. This evaluates to the value of
its second argument; the value of the first argument is lost (which has
therefore been evaluated only for its side effects.
syntax Exp ::= "ref" [macro] | "&" Name | "@" Exp [strict] | Exp ":=" Exp [strict] | Exp ";" Exp [strict(1), right]
Call-with-current-continuation, named callcc
in FUN, is a
powerful control operator that originated in the Scheme programming
language, but it now exists in many other functional languages. It works
by evaluating its argument, expected to evaluate to a function, and by
passing the current continuation, or evaluation context (or computation,
in K terminology), as a special value to it. When/If this special value
is invoked, the current context is discarded and replaced with the one
held by the special value and the computation continues from there.
It is like taking a snapshot of the execution context at some moment
in time and then, when desired, being able to get back in time to that
point. If you like games, it is like saving the game now (so you can
work on your homework!) and then continuing the game tomorrow or whenever
you wish. To issustrate the strength of callcc
, we also
allow exceptions in FUN by means of a conventional try-catch
construct, which will desugar to callcc
. We also need to
introduce the special expression contant throw
, but we need to
use it as a function argument name in the desugaring macro, so we define
it as a name instead of as an expression constant:
syntax Exp ::= "try" Exp "catch" "(" Name ")" Exp [macro] syntax Val ::= "callcc" syntax Name ::= "throw" [token]
Finally, FUN also allows polymorphic datatype declarations. These
will be useful when we define the type system later on.
syntax Exp ::= "datatype" Type "=" TypeCases Exp [macro] // NOTE: In a future version of K, we want the datatype declaration // to be a construct by itself, but that is not possible currently // because K's parser wronly identifies the __ operation allowing // a declaration to appear in front of an expression with the function // application construct, giving ambiguous parsing errors.
We next need to define the syntax of types and type cases that appear
in datatype declarations.
Like in many functional languages, type parameters/variables in
user-defined types are quoted identifiers.
syntax TypeVar [token] syntax TypeVars ::= List{TypeVar,","} [overload(types)]
Types can be basic types, function types, or user-defined
parametric types. In the dynamic semantics we are going to simply ignore
all the type declations, so here the syntax of types below is only useful
for generating the desired parser. To avoid syntactic ambiguities with
the arrow construct for function cases, we use the symbol -->
as
a constructor for function types:
syntax TypeName [token] syntax Type ::= "int" | "bool" | "string" | Type "-->" Type [right] | "(" Type ")" [bracket] | TypeVar | TypeName [klabel(TypeName), avoid] | Type TypeName [klabel(Type-TypeName), symbol, macro] | "(" Types ")" TypeName [prefer] syntax Types ::= List{Type,","} [overload(types)] syntax Types ::= TypeVars syntax TypeCase ::= ConstructorName | ConstructorName "(" Types ")" syntax TypeCases ::= List{TypeCase,"|"} [symbol(_|TypeCase_)]
syntax priority @__FUN-UNTYPED-COMMON > apply > arith > _:=__FUN-UNTYPED-COMMON > let_in__FUN-UNTYPED-COMMON letrec_in__FUN-UNTYPED-COMMON if_then_else__FUN-UNTYPED-COMMON > _;__FUN-UNTYPED-COMMON > fun__FUN-UNTYPED-COMMON > datatype_=___FUN-UNTYPED-COMMON endmodule module FUN-UNTYPED-MACROS imports FUN-UNTYPED-COMMON
We desugar the list non-constructor operations to functions matching
over list patterns. In order to do that we need some new variables; for
those, we follow the same convention like in the K tutorial, where we
added them as new identifier constructs starting with the character $
,
so we can easily recognize them when we debug or trace the semantics.
syntax Name ::= "$h" [token] | "$t" [token] rule head => fun [$h|$t] -> $h rule tail => fun [$h|$t] -> $t rule null? => fun [.Exps] -> true | [$h|$t] -> false
Multiple-head list patterns desugar into successive one-head patterns:
rule [E1,E2,Es:Exps|T] => [E1|[E2,Es|T]] [anywhere]
Uncurrying of multiple arguments in functions and binders:
rule P1 P2 -> E => P1 -> fun P2 -> E [anywhere] rule F P = E => F = fun P -> E [anywhere]
We desugar the try-catch
construct into callcc:
syntax Name ::= "$k" [token] | "$v" [token] rule try E catch(X) E' => callcc (fun $k -> (fun throw -> E)(fun X -> $k E'))
For uniformity, we reduce all types to their general form:
rule `Type-TypeName`(T:Type, Tn:TypeName) => (T) Tn
The dynamic semantics ignores all the type declarations:
rule datatype _T = _TCs E => E endmodule module FUN-UNTYPED-SYNTAX imports FUN-UNTYPED-COMMON imports BUILTIN-ID-TOKENS syntax Name ::= r"[a-z][_a-zA-Z0-9]*" [token, prec(2)] | #LowerId [token] syntax ConstructorName ::= #UpperId [token] syntax TypeVar ::= r"['][a-z][_a-zA-Z0-9]*" [token] syntax TypeName ::= Name [token] endmodule
The semantics below is environment-based. A substitution-based
definition of FUN is also available, but that drops the &
construct as explained above.
module FUN-UNTYPED imports FUN-UNTYPED-COMMON imports FUN-UNTYPED-MACROS imports DOMAINS //imports PATTERN-MATCHING
The k
, env
, and store
cells are standard
(see, for example, the definition of LAMBDA++ or IMP++ in the first
part of the K tutorial).
configuration <T color="yellow"> <k color="green"> $PGM:Exp </k> <env color="violet"> .Map </env> <store color="white"> .Map </store> </T>
We only define integers, Booleans and strings as values here, but will
add more values later.
syntax Val ::= Int | Bool | String syntax Vals ::= Bottoms syntax KResult ::= Val
rule <k> X:Name => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V ...</store>
rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule I1 + I2 => I1 +Int I2 rule S1 ^ S2 => S1 +String S2 rule I1 - I2 => I1 -Int I2 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2 rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2 rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E
rule if true then E else _ => E rule if false then _ else E => E
We have already declared the syntactic list of expressions strict, so
we can assume that all the elements that appear in a FUN list are
evaluated. The only thing left to do is to state that a list of
values is a value itself, that is, that the list square-bracket
construct is indeed a constructor, and to give the semantics of
cons
. Since cons
is a builtin function and is
expected to take two arguments, we have to also state that
cons
itself is a value (specifically, a function/closure
value, but we do not need that level of detail here), and also that
cons
applied to a value is a value (specifically, it would be
a function/closure value that expects the second, list argument):
rule cons V:Val [Vs:Vals] => [V,Vs]
Constructors take values as arguments and produce other values:
syntax Val ::= ConstructorName
Like in the environment-based semantics of LAMBDA++ in the first part
of the K tutorial, functions evaluate to closures. A closure includes
the current environment besides the function contents; the environment
will be used at execution time to lookup all the variables that appear
free in the function body (we want static scoping in FUN).
syntax Val ::= closure(Map,Cases) rule <k> fun Cases => closure(Rho,Cases) ...</k> <env> Rho </env>
Note: The reader may want to get familiar with
how the pre-defined pattern matching works before proceeding.
The best way to do that is to consult
k/include/modules/pattern-matching.k
.
We distinguish two cases when the closure is applied.
If the first pattern matches, then we pick the first case: switch to
the closed environment, get the matching map and bind all its
variables, and finally evaluate the function body of the first case,
making sure that the environment is properly recovered afterwards.
If the first pattern does not match, then we drop it and thus move on
to the next one.
rule (.K => getMatching(P, V)) ~> closure(_, P->_ | _) V:Val rule <k> matchResult(M:Map) ~> closure(Rho, _->E | _) _ => bindMap(M) ~> E ~> setEnv(Rho') ...</k> <env> Rho' => Rho </env> rule (matchFailure => .K) ~> closure(_, (_->_ | Cs:Cases => Cs)) _ // rule <k> closure(Rho, P->E | _) V:Val // => bindMap(getMatching(P,V)) ~> E ~> setEnv(Rho') ...</k> // <env> Rho' => Rho </env> when isMatching(P,V) // rule closure(_, (P->_ | Cs:Cases => Cs)) V:Val when notBool isMatching(P,V)
To highlight the similarities and differences between let
and
letrec
, we prefer to give them direct semantics instead of
to desugar them like in LAMBDA. See the formal definitions of
bindTo
, bind
, and assignTo
at the end of
this module. Informally, bindTo(Xs, Es)
first
evaluates the expressions Es
in Exps
in the current
environment (i.e., it is strict in its second argument), then it binds
the variables in Xs
in Names
to new locations and adds
those bindings to the environment, and finally writes the values
previously obtained after evaluating the expressions Es
to those
new locations; bind(Xs)
does only the bindings of
Xs
to new locations and adds those bindings to the environment;
and assignTo(Xs,Es)
evaluates the expressions
Es
in the current environment and then it writes the resulting
values to the locations to which the variables Xs
are already
bound to in the environment.
Therefore, let Xs = Es in E
first
evaluates Es
in the current environment, then adds new
bindings for Xs
to fresh locations in the environment, then
writes the values of Es
to those locations, and finally
evaluates E
in the new environment, making sure that the
environment is properly recovered after the evaluation of E
.
On the other hand, letrec
does the same things but in a
different order: it first adds new bindings for Xs
to fresh
locations in the environment, then it evaluates Es
in the new
environment, then it writes the resulting values to their
corresponding locations, and finally it evaluates E
and
recovers the environment. The crucial difference is that the
expressions Es
now see the locations of the variables Xs
in the environment, so if they are functions, which is typically the
case with letrec
, their closures will encapsulate in their
environments the bindings of all the bound variables, including
themselves (thus, we may have a closure value stored at location
L
, whose environment contains a binding of the form
F ↦ L
; this way, the closure can invoke
itself).
rule <k> let Bs in E => bindTo(names(Bs),exps(Bs)) ~> E ~> setEnv(Rho) ...</k> <env> Rho </env> rule <k> letrec Bs in E => bind(names(Bs))~>assignTo(names(Bs),exps(Bs))~>E~>setEnv(Rho)...</k> <env> Rho </env>
Recall that our syntax allows let
and letrec
to
take any expression in place of its binding. This allows us to use
the already existing function application construct to bind names to
functions, such as, e.g., let x y = y in ...
.
The desugaring macro in the syntax module uncurries such declarations,
and then the semantic rules above only work when the remaining
bindings are identifiers, so the semantics will get stuck on programs
that misuse the let
and letrec
binders.
The semantics of references is self-explanatory, except maybe for the
desugaring rule of ref
, which is further discussed. Note
that &X
grabs the location of X
from the environment.
Sequential composition, which is needed only to accumulate the
side effects due to assignments, was strict in the first argument.
Once evaluated, its first argument is simply discarded:
syntax Name ::= "$x" [token] rule ref => fun $x -> & $x rule <k> & X => L ...</k> <env>... X |-> L ...</env> rule <k> @ L:Int => V:Val ...</k> <store>... L |-> V ...</store> rule <k> L:Int := V:Val => V ...</k> <store>... L |-> (_=>V) ...</store> rule _V:Val; E => E
The desugaring rule of ref
(first rule above) works
because &
takes a variable and returns its location (like in C).
Note that some ``pure'' functional programming researchers strongly dislike
the &
construct, but favor ref
. We refrain from having
a personal opinion on this issue here, but support &
in the
environment-based definition of FUN because it is, technically speaking,
more powerful than ref
. From a language design perspective, it
would be equally easy to drop &
and instead give a direct
semantics to ref
. In fact, this is precisely what we do in the
substitution-based definition of FUN, because there appears to be no way
to give a substitution-based definition to the &
construct.
As we know it from the LAMBDA++ tutorial, call-with-current-continuation
is quite easy to define in K. We first need to define a special
value wrapping an execution context, that is, an environment saying
where the variables should be looked up, and a computation structure
saying what is left to execute (in a substitution-based definition,
this special value would be even simpler, as it would only need to
wrap the computation structure---see, for example, the
substitution-based semantics of LAMBDA++ in the the first part of the
K tutorial, or the substitution-based definition of FUN). Then
callcc
creates such a value containing the current
environment and the current remaining computation, and passes it to
its argument function. When/If invoked, the special value replaces
the current execution context with its own and continues the execution
normally.
syntax Val ::= cc(Map,K) rule <k> (callcc V:Val => V cc(Rho,K)) ~> K </k> <env> Rho </env> rule <k> cc(Rho,K) V:Val ~> _ => V ~> K </k> <env> _ => Rho </env>
The environment recovery operation is the same as for the LAMBDA++
language in the K tutorial and many other languages provided with the
K distribution. The first ``anywhere'' rule below shows an elegant
way to achieve the benefits of tail recursion in K.
syntax KItem ::= setEnv(Map) // TODO: get rid of env //rule (setEnv(_) => .) ~> setEnv(_) [anywhere] rule <k> _:Val ~> (setEnv(Rho) => .K) ...</k> <env> _ => Rho </env>
bindTo
, bind
and assignTo
The meaning of these operations has already been explained when we
discussed the let
and letrec
language constructs
above.
syntax KItem ::= bindTo(Names,Exps) [strict(2)] | bindMap(Map) | bind(Names) rule (.K => getMatchingAux(Xs,Vs)) ~> bindTo(Xs:Names,Vs:Vals) rule matchResult(M:Map) ~> bindTo(_:Names, _:Vals) => bindMap(M) rule bindMap(.Map) => .K rule <k> bindMap((X:Name |-> V:Val => .Map) _:Map) ...</k> <env> Rho => Rho[X <- !L:Int] </env> <store>... .Map => !L |-> V ...</store> rule bind(.Names) => .K rule <k> bind(X:Name,Xs => Xs) ...</k> <env> Rho => Rho[X <- !_L:Int] </env> syntax KItem ::= assignTo(Names,Exps) [strict(2)] rule <k> assignTo(.Names,.Vals) => .K ...</k> rule <k> assignTo((X:Name,Xs => Xs),(V:Val,Vs:Vals => Vs)) ...</k> <env>... X |-> L ...</env> <store>... .Map => L |-> V ...</store>
The following auxiliary operations extract the list of identifiers
and of expressions in a binding, respectively.
syntax Names ::= names(Bindings) [function] rule names(.Bindings) => .Names rule names(X:Name=_ and Bs) => (X,names(Bs))::Names syntax Exps ::= exps(Bindings) [function] rule exps(.Bindings) => .Exps rule exps(_:Name=E and Bs) => E,exps(Bs) /* Extra kore stuff */ syntax KResult ::= Vals syntax Exps ::= Names syntax Names ::= Bottoms /* Matching */ syntax MatchResult ::= getMatching(Exp, Val) [function] | getMatchingAux(Exps, Vals) [function] | mergeMatching(MatchResult, MatchResult) [function] | matchResult(Map) | "matchFailure" rule getMatching(C:ConstructorName(Es:Exps), C(Vs:Vals)) => getMatchingAux(Es, Vs) rule getMatching([Es:Exps], [Vs:Vals]) => getMatchingAux(Es, Vs) rule getMatching(C:ConstructorName, C) => matchResult(.Map) rule getMatching(B:Bool, B) => matchResult(.Map) rule getMatching(I:Int, I) => matchResult(.Map) rule getMatching(S:String, S) => matchResult(.Map) rule getMatching(N:Name, V:Val) => matchResult(N |-> V) rule getMatching(_, _) => matchFailure [owise] rule getMatchingAux((E:Exp, Es:Exps), (V:Val, Vs:Vals)) => mergeMatching(getMatching(E, V), getMatchingAux(Es, Vs)) rule getMatchingAux(.Exps, .Vals) => matchResult(.Map) rule getMatchingAux(_, _) => matchFailure [owise] rule mergeMatching(matchResult(M1:Map), matchResult(M2:Map)) => matchResult(M1 M2) requires intersectSet(keys(M1), keys(M2)) ==K .Set //rule mergeMatching(_, _) => matchFailure [owsie] rule mergeMatching(matchResult(_:Map), matchFailure) => matchFailure rule mergeMatching(matchFailure, matchResult(_:Map)) => matchFailure rule mergeMatching(matchFailure, matchFailure) => matchFailure
Besides the generic decomposition rules for patterns and values,
we also want to allow [head|tail]
matching for lists, so we add
the following custom pattern decomposition rule:
rule getMatching([H:Exp | T:Exp], [V:Val, Vs:Vals]) => getMatchingAux((H, T), (V, [Vs])) endmodule
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of the untyped FUN language.
FUN is a pedagogical and research language that captures the essence
of the functional programming paradigm, extended with several features
often encountered in functional programming languages.
Like many functional languages, FUN is an expression language, that
is, everything, including the main program, is an expression.
Functions can be declared anywhere and are first class values in the
language.
FUN is call-by-value here, but it has been extended (as student
homework assignments) with other parameter-passing styles.
To make it more interesting and to highlight some of K's strengths,
FUN includes the following features:
The basic builtin data-types of integers, booleans and strings.
Builtin lists, which can hold any elements, including other lists.
Lists are enclosed in square brackets and their elements are
comma-separated; e.g., [1,2,3]
.
User-defined data-types, by means of constructor terms.
Constructor names start with a capital letter (while any other
identifier in the language starts with a lowercase letter), and they
can be followed by an arbitrary number of comma-separated arguments
enclosed in parentheses; parentheses are not needed when the
constructor takes no arguments.
For example, Pair(5,7)
is a constructor term holding two
numbers, Cons(1,Cons(2,Cons(3,Nil)))
is a list-like
constructor term holding 3 elements, and
Tree(Tree(Leaf(1), Leaf(2)), Leaf(3))
is a tree-like
constructor term holding 3 elements.
In the untyped version of the FUN language, no type checking or
inference is performed to ensure that the data constructors are used
correctly.
The execution will simply get stuck when they are misused.
Moreover, since no type checking is performed, the data-types are not
even declared in the untyped version of FUN.
Functions and let
/letrec
binders can take
multiple space-separated arguments, but these are desugared to
ones that only take one argument, by currying. For example, the
expressions
fun x y -> x y
let x y = y in x
are desugared, respectively, into the following expressions:
fun x -> fun y -> x y
let x = fun y -> y in x
Functions can be defined using pattern matching over the
available data-types. For example, the program
letrec max = fun [h] -> h
| [h|t] -> let x = max t
in if h > x then h else x
in max [1, 3, 5, 2, 4, 0, -1, -5]
defines a function max
that calculates the maximum element of
a non-empty list, and the function
letrec ack = fun Pair(0,n) -> n + 1
| Pair(m,0) -> ack Pair(m - 1, 1)
| Pair(m,n) -> ack Pair(m - 1, ack Pair(m, n - 1))
in ack Pair(2,3)
calculates the Ackermann function applied to a particular pair of numbers.
Patterns can be nested. Patterns can currently only be used in function
definitions, and not directly in let
/letrec
binders.
For example, this is not allowed:
letrec Pai(x,y) = Pair(1,2) in x+y
But this is allowed:
let f Pair(x,y) = x+y in f Pair(1,2)
because it is first reduced to
let f = fun Pair(x,y) -> x+y in f Pair(1,2)
by uncurrying of the let
binder, and pattern matching is
allowed in function arguments.
We include a callcc
construct, for two reasons: first,
several functional languages support this construct; second, some
semantic frameworks have difficulties defining it. Not K.
Finally, we include mutables by means of referencing an
expression, getting the reference of a variable, dereferencing and
assignment. We include these for the same reasons as above: there are
languages which have them, and they are not easy to define in some
semantic frameworks.
Like in many other languages, some of FUN's constructs can be
desugared into a smaller set of basic constructs. We do that as usual,
using macros, and then we only give semantics to the core constructs.
Note:
We recommend the reader to first consult the dynamic semantics of the
LAMBDA++ language in the first part of the K Tutorial.
To keep the comments below small and focused, we will not re-explain
functional or K features that have already been explained in there.
//require "modules/pattern-matching.k" module FUN-UNTYPED-COMMON imports DOMAINS-SYNTAX
FUN is an expression language. The constructs below fall into
several categories: names, arithmetic constructs, conventional
functional constructs, patterns and pattern matching, data constructs,
lists, references, and call-with-current-continuation (callcc).
The arithmetic constructs are standard; they are present in almost all
our K language definitions. The meaning of FUN's constructs are
discussed in more depth when we define their semantics in the next
module.
We start with the syntactic definition of FUN names.
We have several categories of names: ones to be used for functions and
variables, others to be used for data constructors, others for types and
others for type variables. We will introduce them as needed, starting
with the former category. We prefer the names of variables and functions
to start with lower case letters. We take the freedom to tacitly introduce
syntactic lists/sequences for each nonterminal for which we need them:
syntax Name [token] syntax Names ::= List{Name,","} [overload(exps)]
Expression constructs will be defined throughtout the syntax module.
Below are the very basic ones, namely the builtins, the names, and the
parentheses used as brackets for grouping. Lists of expressions are
declared strict, so all expressions in the list get evaluated whenever
the list is on a position which can be evaluated:
syntax Exp ::= Int | Bool | String | Name | "(" Exp ")" [bracket] syntax Exps ::= List{Exp,","} [strict, overload(exps)] syntax Val syntax Exp ::= Val syntax Exps ::= Vals syntax Vals ::= List{Val,","} [overload(exps)] syntax Bottom syntax Bottoms ::= List{Bottom,","} [overload(exps)]
We next define the syntax of arithmetic constructs, together with
their relative priorities and left-/non-associativities. We also
tag all these rules as members of a new group, "arith", so we can more easily
define global syntax priorities later (at the end of the syntax module).
syntax Exp ::= left: Exp "*" Exp [strict, group(arith)] | Exp "/" Exp [strict, group(arith)] | Exp "%" Exp [strict, group(arith)] > left: Exp "+" Exp [strict, left, group(arith)] | Exp "^" Exp [strict, left, group(arith)] // left attribute should not be necessary; currently a parsing bug | Exp "-" Exp [strict, prefer, group(arith)] // the "prefer" attribute above is to not parse x-1 as x(-1) // Due to some parsing problems, we currently cannot add unary minus: | "-" Exp [strict, group(arith)] > non-assoc: Exp "<" Exp [strict, group(arith)] | Exp "<=" Exp [strict, group(arith)] | Exp ">" Exp [strict, group(arith)] | Exp ">=" Exp [strict, group(arith)] | Exp "==" Exp [strict, group(arith)] | Exp "!=" Exp [strict, group(arith)] > "!" Exp [strict, group(arith)] > Exp "&&" Exp [strict(1), left, group(arith)] > Exp "||" Exp [strict(1), left, group(arith)]
The conditional construct has the expected evaluation strategy,
stating that only the first argument is evaluate:
syntax Exp ::= "if" Exp "then" Exp "else" Exp [strict(1)]
FUN's builtin lists are formed by enclosing comma-separated
sequences of expressions (i.e., terms of sort Exps
) in square
brackets. The list constructor cons
adds a new element to the
top of the list, head
and tail
get the first element
and the tail sublist of a list if they exist, respectively, and get
stuck otherwise, and null??
tests whether a list is empty or
not; syntactically, these are just expression constants.
In function patterns, we are also going to allow patterns following the
usual head/tail notation; for example, the pattern [x_1,...,x_n|t]
binds x_1
, ..., x_n
to the first elements of the matched list,
and t
to the list formed with the remaining elements. We define list
patterns as ordinary expression constructs, although we will make sure that
we do not give them semantics if they appear in any other place then in a
function case pattern.
syntax Exp ::= "[" Exps "]" [strict, klabel(list)] | "head" [macro] | "tail" [macro] | "null?" [macro] | "[" Exps "|" Exp "]" syntax Val ::= "[" Vals "]" [klabel(list)] syntax Cons ::= "cons" syntax Val ::= Cons syntax Val ::= Cons Val [klabel(apply)]
Data constructors start with capital letters and they may or may
not have arguments. We need to use the attribute "prefer" to make
sure that, e.g., Cons(a)
parses as constructor Cons
with
argument a
, and not as the expression Cons
(because
constructor names are also expressions) regarded as a function applied
to the expression a
. Also, note that the constructor is strict
in its second argument, because we want to evaluate its arguments but
not the constuctor name itsef.
syntax ConstructorName [token] syntax Exp ::= ConstructorName | ConstructorName "(" Exps ")" [prefer, strict(2), klabel(constructor)] syntax Val ::= ConstructorName "(" Vals ")" [klabel(constructor)]
A function is essentially a |
-separated ordered
sequence of cases, each case of the form pattern -> expression
,
preceded by the language construct fun
. Patterns will be defined
shortly, both for the builtin lists and for user-defined constructors.
Recall that the syntax we define in K is not meant to serve as a
ultimate parser for the defined language, but rather as a convenient
notation for K abstract syntax trees, which we prefer when we write
the semantic rules. It is therefore often the case that we define a
more ``generous'' syntax than we want to allow programs to use.
We do it here, too. Specifically, the syntax of Cases
below allows any expressions to appear as pattern. This syntactic
relaxation permits many wrong programs to be parsed, but that is not a
problem because we are not going to give semantics to wrong combinations,
so those programs will get stuck; moreover, our type inferencer will reject
those programs anyway. Function application is just concatenation of
expressions, without worrying about type correctness. Again, the type
system will reject type-incorrect programs.
syntax Exp ::= "fun" Cases | Exp Exp [strict, left, klabel(apply)] // NOTE: We would like eventually to also have Exp "(" Exps ") syntax Case ::= Exp "->" Exp syntax Cases ::= List{Case, "|"}
The let
and letrec
binders have the usual syntax
and functional meaning. We allow multiple and
-separated bindings.
Like for the function cases above, we allow a more generous syntax for
the left-hand sides of bindings, noting that the semantics will get stuck
on incorrect bindings and that the type system will reject those programs.
syntax Exp ::= "let" Bindings "in" Exp | "letrec" Bindings "in" Exp [prefer] // The "prefer" attribute for letrec currently needed due to tool bug, // to make sure that "letrec" is not parsed as "let rec". syntax Binding ::= Exp "=" Exp syntax Bindings ::= List{Binding,"and"}
References are first class values in FUN. The construct ref
takes an expression, evaluates it, and then it stores the resulting value
at a fresh location in the store and returns that reference. Syntactically,
ref
is just an expression constant. The construct &
takes a name as argument and evaluates to a reference, namely the store
reference where the variable passed as argument stores its value; this
construct is a bit controversial and is further discussed in the
environment-based semantics of the FUN language, where we desugar
ref
to it. The construct @
takes a reference
and evaluates to the value stored there. The construct :=
takes
two expressions, the first expected to evaluate to a reference; the value
of its second argument will be stored at the location to which the first
points (the old value is thus lost). Finally, since expression evaluation
now has side effects, it makes sense to also add a sequential composition
construct, which is sequentially strict. This evaluates to the value of
its second argument; the value of the first argument is lost (which has
therefore been evaluated only for its side effects.
syntax Exp ::= "ref" [macro] | "&" Name | "@" Exp [strict] | Exp ":=" Exp [strict] | Exp ";" Exp [strict(1), right]
Call-with-current-continuation, named callcc
in FUN, is a
powerful control operator that originated in the Scheme programming
language, but it now exists in many other functional languages. It works
by evaluating its argument, expected to evaluate to a function, and by
passing the current continuation, or evaluation context (or computation,
in K terminology), as a special value to it. When/If this special value
is invoked, the current context is discarded and replaced with the one
held by the special value and the computation continues from there.
It is like taking a snapshot of the execution context at some moment
in time and then, when desired, being able to get back in time to that
point. If you like games, it is like saving the game now (so you can
work on your homework!) and then continuing the game tomorrow or whenever
you wish. To issustrate the strength of callcc
, we also
allow exceptions in FUN by means of a conventional try-catch
construct, which will desugar to callcc
. We also need to
introduce the special expression contant throw
, but we need to
use it as a function argument name in the desugaring macro, so we define
it as a name instead of as an expression constant:
syntax Exp ::= "try" Exp "catch" "(" Name ")" Exp [macro] syntax Val ::= "callcc" syntax Name ::= "throw" [token]
Finally, FUN also allows polymorphic datatype declarations. These
will be useful when we define the type system later on.
syntax Exp ::= "datatype" Type "=" TypeCases Exp [macro] // NOTE: In a future version of K, we want the datatype declaration // to be a construct by itself, but that is not possible currently // because K's parser wronly identifies the __ operation allowing // a declaration to appear in front of an expression with the function // application construct, giving ambiguous parsing errors.
We next need to define the syntax of types and type cases that appear
in datatype declarations.
Like in many functional languages, type parameters/variables in
user-defined types are quoted identifiers.
syntax TypeVar [token] syntax TypeVars ::= List{TypeVar,","} [overload(types)]
Types can be basic types, function types, or user-defined
parametric types. In the dynamic semantics we are going to simply ignore
all the type declations, so here the syntax of types below is only useful
for generating the desired parser. To avoid syntactic ambiguities with
the arrow construct for function cases, we use the symbol -->
as
a constructor for function types:
syntax TypeName [token] syntax Type ::= "int" | "bool" | "string" | Type "-->" Type [right] | "(" Type ")" [bracket] | TypeVar | TypeName [klabel(TypeName), avoid] | Type TypeName [klabel(Type-TypeName), symbol, macro] | "(" Types ")" TypeName [prefer] syntax Types ::= List{Type,","} [overload(types)] syntax Types ::= TypeVars syntax TypeCase ::= ConstructorName | ConstructorName "(" Types ")" syntax TypeCases ::= List{TypeCase,"|"} [symbol(_|TypeCase_)]
syntax priority @__FUN-UNTYPED-COMMON > apply > arith > _:=__FUN-UNTYPED-COMMON > let_in__FUN-UNTYPED-COMMON letrec_in__FUN-UNTYPED-COMMON if_then_else__FUN-UNTYPED-COMMON > _;__FUN-UNTYPED-COMMON > fun__FUN-UNTYPED-COMMON > datatype_=___FUN-UNTYPED-COMMON endmodule module FUN-UNTYPED-MACROS imports FUN-UNTYPED-COMMON
We desugar the list non-constructor operations to functions matching
over list patterns. In order to do that we need some new variables; for
those, we follow the same convention like in the K tutorial, where we
added them as new identifier constructs starting with the character $
,
so we can easily recognize them when we debug or trace the semantics.
syntax Name ::= "$h" [token] | "$t" [token] rule head => fun [$h|$t] -> $h rule tail => fun [$h|$t] -> $t rule null? => fun [.Exps] -> true | [$h|$t] -> false
Multiple-head list patterns desugar into successive one-head patterns:
rule [E1,E2,Es:Exps|T] => [E1|[E2,Es|T]] [anywhere]
Uncurrying of multiple arguments in functions and binders:
rule P1 P2 -> E => P1 -> fun P2 -> E [anywhere] rule F P = E => F = fun P -> E [anywhere]
We desugar the try-catch
construct into callcc:
syntax Name ::= "$k" [token] | "$v" [token] rule try E catch(X) E' => callcc (fun $k -> (fun throw -> E)(fun X -> $k E'))
For uniformity, we reduce all types to their general form:
rule `Type-TypeName`(T:Type, Tn:TypeName) => (T) Tn
The dynamic semantics ignores all the type declarations:
rule datatype _T = _TCs E => E endmodule module FUN-UNTYPED-SYNTAX imports FUN-UNTYPED-COMMON imports BUILTIN-ID-TOKENS syntax Name ::= r"[a-z][_a-zA-Z0-9]*" [token, prec(2)] | #LowerId [token] syntax ConstructorName ::= #UpperId [token] syntax TypeVar ::= r"['][a-z][_a-zA-Z0-9]*" [token] syntax TypeName ::= Name [token] endmodule
The semantics below is environment-based. A substitution-based
definition of FUN is also available, but that drops the &
construct as explained above.
module FUN-UNTYPED imports FUN-UNTYPED-COMMON imports FUN-UNTYPED-MACROS imports DOMAINS //imports PATTERN-MATCHING
The k
, env
, and store
cells are standard
(see, for example, the definition of LAMBDA++ or IMP++ in the first
part of the K tutorial).
configuration <T color="yellow"> <k color="green"> $PGM:Exp </k> <env color="violet"> .Map </env> <store color="white"> .Map </store> </T>
We only define integers, Booleans and strings as values here, but will
add more values later.
syntax Val ::= Int | Bool | String syntax Vals ::= Bottoms syntax KResult ::= Val
rule <k> X:Name => V ...</k> <env>... X |-> L ...</env> <store>... L |-> V ...</store>
rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 requires I2 =/=K 0 rule I1 % I2 => I1 %Int I2 requires I2 =/=K 0 rule I1 + I2 => I1 +Int I2 rule S1 ^ S2 => S1 +String S2 rule I1 - I2 => I1 -Int I2 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2 rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2 rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E
rule if true then E else _ => E rule if false then _ else E => E
We have already declared the syntactic list of expressions strict, so
we can assume that all the elements that appear in a FUN list are
evaluated. The only thing left to do is to state that a list of
values is a value itself, that is, that the list square-bracket
construct is indeed a constructor, and to give the semantics of
cons
. Since cons
is a builtin function and is
expected to take two arguments, we have to also state that
cons
itself is a value (specifically, a function/closure
value, but we do not need that level of detail here), and also that
cons
applied to a value is a value (specifically, it would be
a function/closure value that expects the second, list argument):
rule cons V:Val [Vs:Vals] => [V,Vs]
Constructors take values as arguments and produce other values:
syntax Val ::= ConstructorName
Like in the environment-based semantics of LAMBDA++ in the first part
of the K tutorial, functions evaluate to closures. A closure includes
the current environment besides the function contents; the environment
will be used at execution time to lookup all the variables that appear
free in the function body (we want static scoping in FUN).
syntax Val ::= closure(Map,Cases) rule <k> fun Cases => closure(Rho,Cases) ...</k> <env> Rho </env>
Note: The reader may want to get familiar with
how the pre-defined pattern matching works before proceeding.
The best way to do that is to consult
k/include/modules/pattern-matching.k
.
We distinguish two cases when the closure is applied.
If the first pattern matches, then we pick the first case: switch to
the closed environment, get the matching map and bind all its
variables, and finally evaluate the function body of the first case,
making sure that the environment is properly recovered afterwards.
If the first pattern does not match, then we drop it and thus move on
to the next one.
rule (.K => getMatching(P, V)) ~> closure(_, P->_ | _) V:Val rule <k> matchResult(M:Map) ~> closure(Rho, _->E | _) _ => bindMap(M) ~> E ~> setEnv(Rho') ...</k> <env> Rho' => Rho </env> rule (matchFailure => .K) ~> closure(_, (_->_ | Cs:Cases => Cs)) _ // rule <k> closure(Rho, P->E | _) V:Val // => bindMap(getMatching(P,V)) ~> E ~> setEnv(Rho') ...</k> // <env> Rho' => Rho </env> when isMatching(P,V) // rule closure(_, (P->_ | Cs:Cases => Cs)) V:Val when notBool isMatching(P,V)
To highlight the similarities and differences between let
and
letrec
, we prefer to give them direct semantics instead of
to desugar them like in LAMBDA. See the formal definitions of
bindTo
, bind
, and assignTo
at the end of
this module. Informally, bindTo(Xs, Es)
first
evaluates the expressions Es
in Exps
in the current
environment (i.e., it is strict in its second argument), then it binds
the variables in Xs
in Names
to new locations and adds
those bindings to the environment, and finally writes the values
previously obtained after evaluating the expressions Es
to those
new locations; bind(Xs)
does only the bindings of
Xs
to new locations and adds those bindings to the environment;
and assignTo(Xs,Es)
evaluates the expressions
Es
in the current environment and then it writes the resulting
values to the locations to which the variables Xs
are already
bound to in the environment.
Therefore, let Xs = Es in E
first
evaluates Es
in the current environment, then adds new
bindings for Xs
to fresh locations in the environment, then
writes the values of Es
to those locations, and finally
evaluates E
in the new environment, making sure that the
environment is properly recovered after the evaluation of E
.
On the other hand, letrec
does the same things but in a
different order: it first adds new bindings for Xs
to fresh
locations in the environment, then it evaluates Es
in the new
environment, then it writes the resulting values to their
corresponding locations, and finally it evaluates E
and
recovers the environment. The crucial difference is that the
expressions Es
now see the locations of the variables Xs
in the environment, so if they are functions, which is typically the
case with letrec
, their closures will encapsulate in their
environments the bindings of all the bound variables, including
themselves (thus, we may have a closure value stored at location
L
, whose environment contains a binding of the form
F ↦ L
; this way, the closure can invoke
itself).
rule <k> let Bs in E => bindTo(names(Bs),exps(Bs)) ~> E ~> setEnv(Rho) ...</k> <env> Rho </env> rule <k> letrec Bs in E => bind(names(Bs))~>assignTo(names(Bs),exps(Bs))~>E~>setEnv(Rho)...</k> <env> Rho </env>
Recall that our syntax allows let
and letrec
to
take any expression in place of its binding. This allows us to use
the already existing function application construct to bind names to
functions, such as, e.g., let x y = y in ...
.
The desugaring macro in the syntax module uncurries such declarations,
and then the semantic rules above only work when the remaining
bindings are identifiers, so the semantics will get stuck on programs
that misuse the let
and letrec
binders.
The semantics of references is self-explanatory, except maybe for the
desugaring rule of ref
, which is further discussed. Note
that &X
grabs the location of X
from the environment.
Sequential composition, which is needed only to accumulate the
side effects due to assignments, was strict in the first argument.
Once evaluated, its first argument is simply discarded:
syntax Name ::= "$x" [token] rule ref => fun $x -> & $x rule <k> & X => L ...</k> <env>... X |-> L ...</env> rule <k> @ L:Int => V:Val ...</k> <store>... L |-> V ...</store> rule <k> L:Int := V:Val => V ...</k> <store>... L |-> (_=>V) ...</store> rule _V:Val; E => E
The desugaring rule of ref
(first rule above) works
because &
takes a variable and returns its location (like in C).
Note that some ``pure'' functional programming researchers strongly dislike
the &
construct, but favor ref
. We refrain from having
a personal opinion on this issue here, but support &
in the
environment-based definition of FUN because it is, technically speaking,
more powerful than ref
. From a language design perspective, it
would be equally easy to drop &
and instead give a direct
semantics to ref
. In fact, this is precisely what we do in the
substitution-based definition of FUN, because there appears to be no way
to give a substitution-based definition to the &
construct.
As we know it from the LAMBDA++ tutorial, call-with-current-continuation
is quite easy to define in K. We first need to define a special
value wrapping an execution context, that is, an environment saying
where the variables should be looked up, and a computation structure
saying what is left to execute (in a substitution-based definition,
this special value would be even simpler, as it would only need to
wrap the computation structure---see, for example, the
substitution-based semantics of LAMBDA++ in the the first part of the
K tutorial, or the substitution-based definition of FUN). Then
callcc
creates such a value containing the current
environment and the current remaining computation, and passes it to
its argument function. When/If invoked, the special value replaces
the current execution context with its own and continues the execution
normally.
syntax Val ::= cc(Map,K) rule <k> (callcc V:Val => V cc(Rho,K)) ~> K </k> <env> Rho </env> rule <k> cc(Rho,K) V:Val ~> _ => V ~> K </k> <env> _ => Rho </env>
The environment recovery operation is the same as for the LAMBDA++
language in the K tutorial and many other languages provided with the
K distribution. The first ``anywhere'' rule below shows an elegant
way to achieve the benefits of tail recursion in K.
syntax KItem ::= setEnv(Map) // TODO: get rid of env //rule (setEnv(_) => .) ~> setEnv(_) [anywhere] rule <k> _:Val ~> (setEnv(Rho) => .K) ...</k> <env> _ => Rho </env>
bindTo
, bind
and assignTo
The meaning of these operations has already been explained when we
discussed the let
and letrec
language constructs
above.
syntax KItem ::= bindTo(Names,Exps) [strict(2)] | bindMap(Map) | bind(Names) rule (.K => getMatchingAux(Xs,Vs)) ~> bindTo(Xs:Names,Vs:Vals) rule matchResult(M:Map) ~> bindTo(_:Names, _:Vals) => bindMap(M) rule bindMap(.Map) => .K rule <k> bindMap((X:Name |-> V:Val => .Map) _:Map) ...</k> <env> Rho => Rho[X <- !L:Int] </env> <store>... .Map => !L |-> V ...</store> rule bind(.Names) => .K rule <k> bind(X:Name,Xs => Xs) ...</k> <env> Rho => Rho[X <- !_L:Int] </env> syntax KItem ::= assignTo(Names,Exps) [strict(2)] rule <k> assignTo(.Names,.Vals) => .K ...</k> rule <k> assignTo((X:Name,Xs => Xs),(V:Val,Vs:Vals => Vs)) ...</k> <env>... X |-> L ...</env> <store>... .Map => L |-> V ...</store>
The following auxiliary operations extract the list of identifiers
and of expressions in a binding, respectively.
syntax Names ::= names(Bindings) [function] rule names(.Bindings) => .Names rule names(X:Name=_ and Bs) => (X,names(Bs))::Names syntax Exps ::= exps(Bindings) [function] rule exps(.Bindings) => .Exps rule exps(_:Name=E and Bs) => E,exps(Bs) /* Extra kore stuff */ syntax KResult ::= Vals syntax Exps ::= Names syntax Names ::= Bottoms /* Matching */ syntax MatchResult ::= getMatching(Exp, Val) [function] | getMatchingAux(Exps, Vals) [function] | mergeMatching(MatchResult, MatchResult) [function] | matchResult(Map) | "matchFailure" rule getMatching(C:ConstructorName(Es:Exps), C(Vs:Vals)) => getMatchingAux(Es, Vs) rule getMatching([Es:Exps], [Vs:Vals]) => getMatchingAux(Es, Vs) rule getMatching(C:ConstructorName, C) => matchResult(.Map) rule getMatching(B:Bool, B) => matchResult(.Map) rule getMatching(I:Int, I) => matchResult(.Map) rule getMatching(S:String, S) => matchResult(.Map) rule getMatching(N:Name, V:Val) => matchResult(N |-> V) rule getMatching(_, _) => matchFailure [owise] rule getMatchingAux((E:Exp, Es:Exps), (V:Val, Vs:Vals)) => mergeMatching(getMatching(E, V), getMatchingAux(Es, Vs)) rule getMatchingAux(.Exps, .Vals) => matchResult(.Map) rule getMatchingAux(_, _) => matchFailure [owise] rule mergeMatching(matchResult(M1:Map), matchResult(M2:Map)) => matchResult(M1 M2) requires intersectSet(keys(M1), keys(M2)) ==K .Set //rule mergeMatching(_, _) => matchFailure [owsie] rule mergeMatching(matchResult(_:Map), matchFailure) => matchFailure rule mergeMatching(matchFailure, matchResult(_:Map)) => matchFailure rule mergeMatching(matchFailure, matchFailure) => matchFailure
Besides the generic decomposition rules for patterns and values,
we also want to allow [head|tail]
matching for lists, so we add
the following custom pattern decomposition rule:
rule getMatching([H:Exp | T:Exp], [V:Val, Vs:Vals]) => getMatchingAux((H, T), (V, [Vs])) endmodule
// NOTE: this definition is not up to date with the latest version of K, as it
// uses both substitution and symbolic reasoning.
// It is intended for documentation and academic purposes only.
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the substitution-based definition of FUN. For additional
explanations regarding the semantics of the various FUN constructs,
the reader should consult the emvironment-based definition of FUN.
requires "substitution.md" //requires "modules/pattern-matching.k" module FUN-UNTYPED-COMMON imports DOMAINS-SYNTAX
syntax Name syntax Names ::= List{Name,","} syntax Exp ::= Int | Bool | String | Name | "(" Exp ")" [bracket] syntax Exps ::= List{Exp,","} [strict] syntax Val syntax Vals ::= List{Val,","} syntax Exp ::= left: Exp "*" Exp [strict, arith] | Exp "/" Exp [strict, arith] | Exp "%" Exp [strict, arith] > left: Exp "+" Exp [strict, left, arith] | Exp "^" Exp [strict, left, arith] | Exp "-" Exp [strict, prefer, arith] | "-" Exp [strict, arith] > non-assoc: Exp "<" Exp [strict, arith] | Exp "<=" Exp [strict, arith] | Exp ">" Exp [strict, arith] | Exp ">=" Exp [strict, arith] | Exp "==" Exp [strict, arith] | Exp "!=" Exp [strict, arith] > "!" Exp [strict, arith] > Exp "&&" Exp [strict(1), left, arith] > Exp "||" Exp [strict(1), left, arith] syntax Exp ::= "if" Exp "then" Exp "else" Exp [strict(1)] syntax Exp ::= "[" Exps "]" [strict] | "cons" | "head" | "tail" | "null?" | "[" Exps "|" Exp "]" syntax Val ::= "[" Vals "]" syntax ConstructorName syntax Exp ::= ConstructorName | ConstructorName "(" Exps ")" [prefer, strict(2)] syntax Val ::= ConstructorName "(" Vals ")" syntax Exp ::= "fun" Cases | Exp Exp [strict, left] syntax Case ::= Exp "->" Exp [binder] // NOTE: The binder attribute above is the only difference between this // module and the syntax module of environment-based FUN. We need // to fix a bug in order to import modules and override the attributes // of operations. syntax Cases ::= List{Case, "|"} syntax Exp ::= "let" Bindings "in" Exp | "letrec" Bindings "in" Exp [prefer] syntax Binding ::= Exp "=" Exp syntax Bindings ::= List{Binding,"and"} syntax Exp ::= "ref" | "&" Name | "@" Exp [strict] | Exp ":=" Exp [strict] | Exp ";" Exp [strict(1), right] syntax Exp ::= "callcc" | "try" Exp "catch" "(" Name ")" Exp syntax Name ::= "throw" [token] syntax Exp ::= "datatype" Type "=" TypeCases Exp syntax TypeVar syntax TypeVars ::= List{TypeVar,","} syntax TypeName syntax Type ::= "int" | "bool" | "string" | Type "-->" Type [right] | "(" Type ")" [bracket] | TypeVar | TypeName [klabel(TypeName), avoid] | Type TypeName [klabel(Type-TypeName), onlyLabel] | "(" Types ")" TypeName [prefer] syntax Types ::= List{Type,","} syntax Types ::= TypeVars syntax TypeCase ::= ConstructorName | ConstructorName "(" Types ")" syntax TypeCases ::= List{TypeCase,"|"} [klabel(_|TypeCase_)]
syntax priority @__FUN-UNTYPED-COMMON > ___FUN-UNTYPED-COMMON > arith > _:=__FUN-UNTYPED-COMMON > let_in__FUN-UNTYPED-COMMON letrec_in__FUN-UNTYPED-COMMON if_then_else__FUN-UNTYPED-COMMON > _;__FUN-UNTYPED-COMMON > fun__FUN-UNTYPED-COMMON > datatype_=___FUN-UNTYPED-COMMON endmodule module FUN-UNTYPED-MACROS imports FUN-UNTYPED-COMMON
rule P1 P2 -> E => P1 -> fun P2 -> E [macro-rec] rule F P = E => F = fun P -> E [macro-rec] rule [E1,E2,Es:Exps|T] => [E1|[E2,Es|T]] [macro-rec] // rule 'TypeName(Tn:TypeName) => (.TypeVars) Tn [macro] rule `Type-TypeName`(T:Type, Tn:TypeName) => (T) Tn [macro] syntax Name ::= "$h" | "$t" rule head => fun [$h|$t] -> $h [macro] rule tail => fun [$h|$t] -> $t [macro] rule null? => fun [.Exps] -> true | [$h|$t] -> false [macro] syntax Name ::= "$k" | "$v" rule try E catch(X) E' => callcc (fun $k -> (fun throw -> E)(fun X -> $k E')) [macro] rule datatype _T = _TCs E => E [macro]
mu needed for letrec, but we put it here so we can also write
programs with mu in them, which is particularly useful for testing.
syntax Exp ::= "mu" Case endmodule module FUN-UNTYPED-SYNTAX imports FUN-UNTYPED-COMMON imports BUILTIN-ID-TOKENS syntax Name ::= r"[a-z][_a-zA-Z0-9]*" [token, prec(2)] | #LowerId [token] syntax ConstructorName ::= #UpperId [token] syntax TypeVar ::= r"['][a-z][_a-zA-Z0-9]*" [token] syntax TypeName ::= Name [token] endmodule
module FUN-UNTYPED imports FUN-UNTYPED-COMMON imports FUN-UNTYPED-MACROS imports DOMAINS imports SUBSTITUTION //imports PATTERN-MATCHING configuration <T color="yellow"> <k color="green"> $PGM:Exp </k> <store color="white"> .Map </store> </T>
Both Name and functions are values now:
syntax Val ::= Int | Bool | String | Name syntax Exp ::= Val syntax Exps ::= Vals syntax KResult ::= Val syntax Exps ::= Names syntax Vals ::= Names rule I1 * I2 => I1 *Int I2 rule I1 / I2 => I1 /Int I2 when I2 =/=K 0 rule I1 % I2 => I1 %Int I2 when I2 =/=K 0 rule I1 + I2 => I1 +Int I2 rule S1 ^ S2 => S1 +String S2 rule I1 - I2 => I1 -Int I2 rule - I => 0 -Int I rule I1 < I2 => I1 <Int I2 rule I1 <= I2 => I1 <=Int I2 rule I1 > I2 => I1 >Int I2 rule I1 >= I2 => I1 >=Int I2 rule V1:Val == V2:Val => V1 ==K V2 rule V1:Val != V2:Val => V1 =/=K V2 rule ! T => notBool(T) rule true && E => E rule false && _ => false rule true || _ => true rule false || E => E rule if true then E else _ => E rule if false then _ else E => E rule isVal(cons) => true rule isVal(cons _V:Val) => true rule cons V:Val [Vs:Vals] => [V,Vs] syntax Val ::= ConstructorName rule isVal(fun _) => true syntax KVar ::= Name syntax Name ::= freshName(Int) [freshGenerator, function] rule freshName(I:Int) => {#parseToken("Name", "#" +String Int2String(I))}:>Name rule (. => getMatching(P, V)) ~> (fun P->_ | _) V:Val rule matchResult(M:Map) ~> (fun _->E | _) _ => E[M] rule (matchFailure => .) ~> (fun (_->_ | Cs:Cases => Cs)) _ // rule (fun P->E | _) V:Val => E[getMatching(P,V)] when isMatching(P,V) // rule (fun (P->_ | Cs:Cases => Cs)) V:Val when notBool isMatching(P,V)
We can reduce multiple bindings to one list binding, and then
apply the usual desugaring of let into function application.
It is important that the rule below is a macro, so let is eliminated
immediately, otherwise it may interfere in ugly ways with substitution.
rule let Bs in E => ((fun [names(Bs)] -> E) [exps(Bs)]) [macro]
We only give the semantics of one-binding letrec.
Multipe bindings are left as an exercise.
// changed because of parsing error //rule mu X:Name -> E => E[(mu X -> E) / X] rule mu X:Name -> E => E[X |-> (mu X -> E)] rule letrec F:Name = E in E' => let F = (mu F -> E) in E' [macro]
We cannot have &
anymore, but we can give direct
semantics to ref
. We also have to declare ref
to
be a value, so that we will never heat on it.
// rule <k> & X => L ...</k> <env>... X |-> L </env> rule isVal(ref) => true rule <k> ref V:Val => !L:Int ...</k> <store>... .Map => !L |-> V ...</store> rule <k> @ L:Int => V:Val ...</k> <store>... L |-> V ...</store> rule <k> L:Int := V:Val => V ...</k> <store>... L |-> (_=>V) ...</store> rule _V:Val; E => E syntax Val ::= cc(K) rule isVal(callcc) => true rule <k> (callcc V:Val => V cc(K)) ~> K </k> rule <k> cc(K) V:Val ~> _ => V ~> K </k>
Auxiliary getters
syntax Names ::= names(Bindings) [function] rule names(.Bindings) => .Names rule names(X:Name=_ and Bs) => X,names(Bs) syntax Exps ::= exps(Bindings) [function] rule exps(.Bindings) => .Exps rule exps(_:Name=E and Bs) => E,exps(Bs) /* Extra kore stuff */ syntax KResult ::= Vals syntax Exps ::= Names /* Matching */ syntax MatchResult ::= getMatching(Exp, Val) [function] | getMatchingAux(Exps, Vals) [function] | mergeMatching(MatchResult, MatchResult) [function] | matchResult(Map) | "matchFailure" rule getMatching(C:ConstructorName(Es:Exps), C(Vs:Vals)) => getMatchingAux(Es, Vs) rule getMatching([Es:Exps], [Vs:Vals]) => getMatchingAux(Es, Vs) rule getMatching(C:ConstructorName, C) => matchResult(.Map) rule getMatching(B:Bool, B) => matchResult(.Map) rule getMatching(I:Int, I) => matchResult(.Map) rule getMatching(S:String, S) => matchResult(.Map) rule getMatching(N:Name, V:Val) => matchResult(N |-> V) rule getMatching(_, _) => matchFailure [owise] rule getMatchingAux((E:Exp, Es:Exps), (V:Val, Vs:Vals)) => mergeMatching(getMatching(E, V), getMatchingAux(Es, Vs)) rule getMatchingAux(.Exps, .Vals) => matchResult(.Map) rule getMatchingAux(_, _) => matchFailure [owise] rule mergeMatching(matchResult(M1:Map), matchResult(M2:Map)) => matchResult(M1 M2) requires intersectSet(keys(M1), keys(M2)) ==K .Set //rule mergeMatching(_, _) => matchFailure [owsie] rule mergeMatching(matchResult(_:Map), matchFailure) => matchFailure rule mergeMatching(matchFailure, matchResult(_:Map)) => matchFailure rule mergeMatching(matchFailure, matchFailure) => matchFailure
Besides the generic decomposition rules for patterns and values,
we also want to allow [head|tail]
matching for lists, so we add
the following custom pattern decomposition rule:
rule getMatching([H:Exp | T:Exp], [V:Val, Vs:Vals]) => getMatchingAux((H, T), (V, [Vs])) endmodule
// NOTE: this definition is not runnable as is.
// It is intended for documentation and academic purposes only.
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of LOGIK, a trivial language
capturing the essence of the logic programming paradigm. In this
definition, we explicitly focus on simplicity and mathematical
clarity, not on advanced logic programming features or performance.
Those are covered in the LOGIK++ extension under examples/logik++
.
Specifically, a LOGIK program consists of a sequence of Horn clauses
of the form
P :- P1, P2, ..., Pn .
followed by a query of the form
?- Q1, Q2, ..., Qm .
where P
, P1
, P2
, ..., Pn
, Q1
, Q2
,
..., Qm
are literals. The
symbol :-
is read "if". A literal has the form
p(T1,T2,...,Tk)
, where p
is a predicate symbol
and where T1,T2,...,Tk
are terms. Terms are built as
usual, with operation symbols and variables. A common
convention in logic programming languages, also adopted here, is that
variables are capitalized and operation symbols are not. Operations
with zero arguments are called constants and are written without
parentheses, that is, c
instead of c()
. Horn
clauses without conditions, called facts, are written
without :-
, that is, P.
instead of P :- .
.
For example, the LOGIK program below gives a few facts about a
parent
predicate, then several clauses defining some useful
predicates including an ancestor
predicate, and finally a
query asking for those who both have ancestors and are ancestors
themselves in the parent
relation:
parent(david,john).
parent(jim,david).
parent(steve,jim).
parent(nathan,steve).
grandparent(A,B):-
parent(A,X),
parent(X,B).
ancestor(A,B):-
parent(A,X),
parents(X,B).
parents(X,X).
parents(A,B):-
ancestor(A,B).
both(X) :- ancestor(A,X), ancestor(X,B).
?- both(X).
Above, we only have constant operation symbols, so these and variables
are the only terms that can be used in predicates. As expected, the
LOGIK program above will give us three solutions for X
:
david
, steve
, and jim
. If we inline the
both(X)
predicate in the query, that is, if we replace the
query with ?- ancestor(A,X), ancestor(X,B).
then we get
10 solutions, one for for each triple A
, X
, and
B
satisfying both predicates ancestor(A,X)
and
ancestor(X,B)
.
As another example, the program below defines an append
predicate followed by a simple goal:
append(nil,L,L).
append(cons(H,T),L,cons(H,Z)) :- append(T,L,Z).
?- append(cons(a,nil), cons(b,nil), V).
Besides the predicate symbol append
, the program above also
includes a constant symbol nil
and a binary operation symbol
cons
. Additionally, the query also includes two more
constants, a
and b
. The capitalized identifiers are
all variables. As expected, the LOGIK program above yields only one
solution, namely V = cons(a,cons(b,nil))
. On the other hand,
if we change the query to:
?- append(L1, cons(a,L2), cons(a,cons(b,cons(a,nil)))).
then LOGIK yields two solutions: one where L1
is
cons(a,cons(b,nil))
and L2
is nil
,
and another where L1
is nil
and L2
is
cons(a,cons(b,nil))
.
The programs above all generated ground solutions, that is,
solutions where the query variables are mapped to ground terms (i.e.,
terms without variables). Let us now consider the following query:
?- append(cons(a,nil), Y, Z).
There are obviously infinitely many ground solutions for the query
above, e.g.,
Y = nil
and Z = cons(a,nil)
,
Y = cons(a,nil)
and Z = cons(a,cons(a,nil))
,
Y = cons(b,nil)
and Z = cons(a,cons(b,nil))
,
Y = cons(c,cons(b,nil))
and Z = cons(a,cons(c,cons(b,nil)))
,
etc. However, all the ground solutions for the query above can be
elegantly characterized by the property that Z
is bound to a list
starting with a
and followed by the list that Y
is
bound to. This property can in fact be described as a symbolic solution
to the query: Z = cons(a,Y)
or, equivalently,
Y = Symb
and Z = cons(a,Symb)
. It is possible to
define a ``more general than'' relation on such symbolic solutions,
in the sense that the more particular solution can be obtained as a
specialization/substitution of the more general one, and then it can
be shown that the above is the most general solution to the
stated query. Logic programming languages, including our LOGIK,
attempt to always compute such most general solutions.
Logic programming languages are highly non-deterministic, in that
several Horn clauses may be used at the same time, each possibly
resulting in a different solution. Implementations of logic
programming languages consist of complex, optimized search and
indexing algorithms, which we are not concerned with here. Instead,
we here take advantage of K's builtin support for search.
Specifically, to find all the solutions of a LOGIK program, we have to
use krun
with the option --search
. However, note
that some programs have infinitely many solutions which cannot relate
to each other by the "more general" relation. For example, the query
?- append(L1, cons(a,L2), L3) .
To address such cases and terminate, logic programming languages allow
the user to choose how many solutions to be computed and displayed.
In LOGIK, we can use the --bound
option of krun
for
this purpose.
Finally, note that some queries have no solution. In some cases that
is easy to detect by exhaustive analysis, such as for the following
query:
?- append(cons(a,L1), L2, cons(b,L3)).
Logic programming languages, including LOGIK, terminate in such cases
and report a no solution answer. However, there are cases where
exhaustive analysis is not sufficient, such as for the query:
?- append(cons(a,L), nil, L).
In such cases, logic programming languages do not terminate. While
one may devise techniques to detect non-termination in some cases,
one cannot do it in general (same like for all Turing-complete
languages).
requires "unification.k" module LOGIK-COMMON imports DOMAINS-SYNTAX
The syntax of LOGIK is straightforward: a program is a sequence of
Horn clauses followed by a query:
syntax Literal syntax Term ::= Literal | Literal "(" Terms ")" syntax Terms ::= List{Term,","} syntax Clause ::= Term ":-" Terms "." | Term "." syntax Query ::= "?-" Terms "." syntax Pgm ::= Query | Clause Pgm endmodule module LOGIK-SYNTAX imports LOGIK-COMMON imports BUILTIN-ID-TOKENS
Variables and literals are defined as tokens following the conventions
used in Prolog (variables start with _ or capital letter, while literals
start with lower case letters):
syntax #KVariable ::= r"[A-Z_][A-Za-z0-9_]*" [token, prec(2)] | #UpperId [token] syntax Term ::= #KVariable [klabel(#SemanticCastToTerm)] syntax Literal ::= r"[a-z][a-zA-Z0-9_]*" [token] | #LowerId [token] endmodule module LOGIK imports LOGIK-COMMON imports DOMAINS imports UNIFICATION
Unification is at the core of logic programming. Here we are
going to use the predefined unification procedure (the same one we
used in the type inferencers in Tutorial 5).
The configuration stores each clause in its own cell for easy access,
and the most general unifier in a cell named mgu
, same like
the type inferencers. The k
cell holds the query and the
fresh
cell holds a fresh clause instance to be attempted on
the next query item. To more easily read the solutions, we add a
second top-level cell, solution
. Both top cells are
optional. Indeed, we start with the main top cell and, when a
solution is found, we move it into the solution
cell and
discard the main cell.
configuration <T color="yellow" multiplicity="?"> <k color="green"> $PGM:Pgm </k> <fresh color="orange"> .K </fresh> <clauses color="red"> <clause color="pink" multiplicity="*"> .K </clause> </clauses> <mgu> .K </mgu> </T> <solution multiplicity="?"> .K </solution>
Before we launch the semantics, we first scan the given program and
place each clause in its own cell, and then place the query in the
k
cell and initialize the mgu with the variables from the query.
Note that we put a fresh instance of the clause to avoid interference with
the query variables. By a "fresh instance" of a clause we mean one whose
variables are renamed with fresh names; we need that in order to avoid
undesired unification conflicts due to particular names chosen for
variables in the original program, as well as conflicts due to
subsequent uses of the same clause. It is safe to rename the
variables in a clause, because clauses are universally quantified in
their variables. This process of creating a fresh instance of a
clause is similar to how we created fresh instances of type schemas in
the higher-order type inferencer discussed in Tutorial 5. Indeed, we
can safely regard clauses as "clause schemas" comprising infinitely
many instances, one for each context.
rule <k> C:Clause Pgm => Pgm </k> (.Bag => <clause> #renameVariables(C) </clause>) rule <k> ?- Ls:Terms. => Ls ...</k> <mgu> _ => #variablesMap(#variables(Ls)) </mgu>
We also sequentialize the goals for easier processing:
rule L:Term, Ls:Terms => L ~> Ls rule .Terms => .
When all the goals are solved, indicated by the empty k
cell, the calculated most general unifier (mgu) is in the mgu
cell. In that case, to ease reading of the final solution we move the
mgu in the solution
cell and delete the rest of the
configuration:
rule <T>... <k> . </k> <mgu> Theta </mgu> ...</T> => <solution> Theta </solution>
Since we are not interested in seeing the failed attempts to solve
the query, we collapse all the error configurations into an empty
configuration (recall that both top-level cells in the configuration
were declared optional). This way, if we see an empty configuration
when we search for all solutions, we know that some attempts failed
(but we do not know which ones).
// this would be nice, but we need feedback from the external unifier // for this. // rule <T>... <mgu> _:MguError </mgu> ...</T> => .
Once all the infrastructure is in place, the actual semantics of LOGIK
is quite simple. All we have to do is to pick some (fresh instance of
a) clause, then unify its conclusion with the first query literal, and
then replace that literal with condition of the clause. The intuition
here is the following: to satisfy the first literal in the query, we
need to find some instance of some clause that matches it, and then to
similarly show that we can satisfy the conditions of that clause.
Mathematically, this is an instance of the proof principle called
resolution: if p ∨ q
and ¬ p ∨ r
hold, then so does
q ∨ r
. We let it as an exercise to the reader to see how the two
relate (hint: assume the negation of the goal together with all the
clauses, and then derive false).
The following two rules are tightly connected and they together
perform the following core task: pick a fresh instance of a clause
which unifies with the first goal item, then add its conditions as new
goals.
Pick a clause and generate a fresh instance of it when the
fresh
cell is empty:
rule <fresh> . => #renameVariables(C) </fresh> <clause> C </clause> <k> T:Term ...</k> requires #unifiable(T,head(C)) syntax Term ::= head(Clause) [function] rule head(L.) => L rule head(L:-_.) => L
If the goal is unifiable with the fresh clause's head, replace the goal
with the clause body, and empty the fresh
cell (so that
another clause can be chosen using the rule above):
rule <k> L:Term => . ...</k> <fresh> L:Term . => . </fresh> rule <k> L:Term :KItem => Ls ...</k> <fresh> L:Term :- Ls:Terms. => . </fresh>
Note that there is no problem if a clause is chosen whose
conclusion literal does not unify with the first goal literal.
The search
option of krun
will systematically try all clauses, so no
solution is missed. Of course, the above is not the most efficient
way to implement a logic programming language, but recall that our
objective here was to present a simple and mathematically clean
solution. We encourage the interested reader to consult the LOGIK++
language definition for a more efficient definition of a richer logic
programming language.
endmodule
// NOTE: this definition is not runnable as is.
// It is intended for documentation and academic purposes only.
Author: Grigore Roșu (grosu@illinois.edu)
Organization: University of Illinois at Urbana-Champaign
Author: Traian Florin Șerbănuță (traian.serbanuta@unibuc.ro)
Organization: University of Bucharest
This is the K semantic definition of LOGIK, a trivial language
capturing the essence of the logic programming paradigm. In this
definition, we explicitly focus on simplicity and mathematical
clarity, not on advanced logic programming features or performance.
Those are covered in the LOGIK++ extension under examples/logik++
.
Specifically, a LOGIK program consists of a sequence of Horn clauses
of the form
P :- P1, P2, ..., Pn .
followed by a query of the form
?- Q1, Q2, ..., Qm .
where P
, P1
, P2
, ..., Pn
, Q1
, Q2
,
..., Qm
are literals. The
symbol :-
is read "if". A literal has the form
p(T1,T2,...,Tk)
, where p
is a predicate symbol
and where T1,T2,...,Tk
are terms. Terms are built as
usual, with operation symbols and variables. A common
convention in logic programming languages, also adopted here, is that
variables are capitalized and operation symbols are not. Operations
with zero arguments are called constants and are written without
parentheses, that is, c
instead of c()
. Horn
clauses without conditions, called facts, are written
without :-
, that is, P.
instead of P :- .
.
For example, the LOGIK program below gives a few facts about a
parent
predicate, then several clauses defining some useful
predicates including an ancestor
predicate, and finally a
query asking for those who both have ancestors and are ancestors
themselves in the parent
relation:
parent(david,john).
parent(jim,david).
parent(steve,jim).
parent(nathan,steve).
grandparent(A,B):-
parent(A,X),
parent(X,B).
ancestor(A,B):-
parent(A,X),
parents(X,B).
parents(X,X).
parents(A,B):-
ancestor(A,B).
both(X) :- ancestor(A,X), ancestor(X,B).
?- both(X).
Above, we only have constant operation symbols, so these and variables
are the only terms that can be used in predicates. As expected, the
LOGIK program above will give us three solutions for X
:
david
, steve
, and jim
. If we inline the
both(X)
predicate in the query, that is, if we replace the
query with ?- ancestor(A,X), ancestor(X,B).
then we get
10 solutions, one for for each triple A
, X
, and
B
satisfying both predicates ancestor(A,X)
and
ancestor(X,B)
.
As another example, the program below defines an append
predicate followed by a simple goal:
append(nil,L,L).
append(cons(H,T),L,cons(H,Z)) :- append(T,L,Z).
?- append(cons(a,nil), cons(b,nil), V).
Besides the predicate symbol append
, the program above also
includes a constant symbol nil
and a binary operation symbol
cons
. Additionally, the query also includes two more
constants, a
and b
. The capitalized identifiers are
all variables. As expected, the LOGIK program above yields only one
solution, namely V = cons(a,cons(b,nil))
. On the other hand,
if we change the query to:
?- append(L1, cons(a,L2), cons(a,cons(b,cons(a,nil)))).
then LOGIK yields two solutions: one where L1
is
cons(a,cons(b,nil))
and L2
is nil
,
and another where L1
is nil
and L2
is
cons(a,cons(b,nil))
.
The programs above all generated ground solutions, that is,
solutions where the query variables are mapped to ground terms (i.e.,
terms without variables). Let us now consider the following query:
?- append(cons(a,nil), Y, Z).
There are obviously infinitely many ground solutions for the query
above, e.g.,
Y = nil
and Z = cons(a,nil)
,
Y = cons(a,nil)
and Z = cons(a,cons(a,nil))
,
Y = cons(b,nil)
and Z = cons(a,cons(b,nil))
,
Y = cons(c,cons(b,nil))
and Z = cons(a,cons(c,cons(b,nil)))
,
etc. However, all the ground solutions for the query above can be
elegantly characterized by the property that Z
is bound to a list
starting with a
and followed by the list that Y
is
bound to. This property can in fact be described as a symbolic solution
to the query: Z = cons(a,Y)
or, equivalently,
Y = Symb
and Z = cons(a,Symb)
. It is possible to
define a ``more general than'' relation on such symbolic solutions,
in the sense that the more particular solution can be obtained as a
specialization/substitution of the more general one, and then it can
be shown that the above is the most general solution to the
stated query. Logic programming languages, including our LOGIK,
attempt to always compute such most general solutions.
Logic programming languages are highly non-deterministic, in that
several Horn clauses may be used at the same time, each possibly
resulting in a different solution. Implementations of logic
programming languages consist of complex, optimized search and
indexing algorithms, which we are not concerned with here. Instead,
we here take advantage of K's builtin support for search.
Specifically, to find all the solutions of a LOGIK program, we have to
use krun
with the option --search
. However, note
that some programs have infinitely many solutions which cannot relate
to each other by the "more general" relation. For example, the query
?- append(L1, cons(a,L2), L3) .
To address such cases and terminate, logic programming languages allow
the user to choose how many solutions to be computed and displayed.
In LOGIK, we can use the --bound
option of krun
for
this purpose.
Finally, note that some queries have no solution. In some cases that
is easy to detect by exhaustive analysis, such as for the following
query:
?- append(cons(a,L1), L2, cons(b,L3)).
Logic programming languages, including LOGIK, terminate in such cases
and report a no solution answer. However, there are cases where
exhaustive analysis is not sufficient, such as for the query:
?- append(cons(a,L), nil, L).
In such cases, logic programming languages do not terminate. While
one may devise techniques to detect non-termination in some cases,
one cannot do it in general (same like for all Turing-complete
languages).
requires "unification.k" module LOGIK-COMMON imports DOMAINS-SYNTAX
The syntax of LOGIK is straightforward: a program is a sequence of
Horn clauses followed by a query:
syntax Literal syntax Term ::= Literal | Literal "(" Terms ")" syntax Terms ::= List{Term,","} syntax Clause ::= Term ":-" Terms "." | Term "." syntax Query ::= "?-" Terms "." syntax Pgm ::= Query | Clause Pgm endmodule module LOGIK-SYNTAX imports LOGIK-COMMON imports BUILTIN-ID-TOKENS
Variables and literals are defined as tokens following the conventions
used in Prolog (variables start with _ or capital letter, while literals
start with lower case letters):
syntax #KVariable ::= r"[A-Z_][A-Za-z0-9_]*" [token, prec(2)] | #UpperId [token] syntax Term ::= #KVariable [klabel(#SemanticCastToTerm)] syntax Literal ::= r"[a-z][a-zA-Z0-9_]*" [token] | #LowerId [token] endmodule module LOGIK imports LOGIK-COMMON imports DOMAINS imports UNIFICATION
Unification is at the core of logic programming. Here we are
going to use the predefined unification procedure (the same one we
used in the type inferencers in Tutorial 5).
The configuration stores each clause in its own cell for easy access,
and the most general unifier in a cell named mgu
, same like
the type inferencers. The k
cell holds the query and the
fresh
cell holds a fresh clause instance to be attempted on
the next query item. To more easily read the solutions, we add a
second top-level cell, solution
. Both top cells are
optional. Indeed, we start with the main top cell and, when a
solution is found, we move it into the solution
cell and
discard the main cell.
configuration <T color="yellow" multiplicity="?"> <k color="green"> $PGM:Pgm </k> <fresh color="orange"> .K </fresh> <clauses color="red"> <clause color="pink" multiplicity="*"> .K </clause> </clauses> <mgu> .K </mgu> </T> <solution multiplicity="?"> .K </solution>
Before we launch the semantics, we first scan the given program and
place each clause in its own cell, and then place the query in the
k
cell and initialize the mgu with the variables from the query.
Note that we put a fresh instance of the clause to avoid interference with
the query variables. By a "fresh instance" of a clause we mean one whose
variables are renamed with fresh names; we need that in order to avoid
undesired unification conflicts due to particular names chosen for
variables in the original program, as well as conflicts due to
subsequent uses of the same clause. It is safe to rename the
variables in a clause, because clauses are universally quantified in
their variables. This process of creating a fresh instance of a
clause is similar to how we created fresh instances of type schemas in
the higher-order type inferencer discussed in Tutorial 5. Indeed, we
can safely regard clauses as "clause schemas" comprising infinitely
many instances, one for each context.
rule <k> C:Clause Pgm => Pgm </k> (.Bag => <clause> #renameVariables(C) </clause>) rule <k> ?- Ls:Terms. => Ls ...</k> <mgu> _ => #variablesMap(#variables(Ls)) </mgu>
We also sequentialize the goals for easier processing:
rule L:Term, Ls:Terms => L ~> Ls rule .Terms => .
When all the goals are solved, indicated by the empty k
cell, the calculated most general unifier (mgu) is in the mgu
cell. In that case, to ease reading of the final solution we move the
mgu in the solution
cell and delete the rest of the
configuration:
rule <T>... <k> . </k> <mgu> Theta </mgu> ...</T> => <solution> Theta </solution>
Since we are not interested in seeing the failed attempts to solve
the query, we collapse all the error configurations into an empty
configuration (recall that both top-level cells in the configuration
were declared optional). This way, if we see an empty configuration
when we search for all solutions, we know that some attempts failed
(but we do not know which ones).
// this would be nice, but we need feedback from the external unifier // for this. // rule <T>... <mgu> _:MguError </mgu> ...</T> => .
Once all the infrastructure is in place, the actual semantics of LOGIK
is quite simple. All we have to do is to pick some (fresh instance of
a) clause, then unify its conclusion with the first query literal, and
then replace that literal with condition of the clause. The intuition
here is the following: to satisfy the first literal in the query, we
need to find some instance of some clause that matches it, and then to
similarly show that we can satisfy the conditions of that clause.
Mathematically, this is an instance of the proof principle called
resolution: if p ∨ q
and ¬ p ∨ r
hold, then so does
q ∨ r
. We let it as an exercise to the reader to see how the two
relate (hint: assume the negation of the goal together with all the
clauses, and then derive false).
The following two rules are tightly connected and they together
perform the following core task: pick a fresh instance of a clause
which unifies with the first goal item, then add its conditions as new
goals.
Pick a clause and generate a fresh instance of it when the
fresh
cell is empty:
rule <fresh> . => #renameVariables(C) </fresh> <clause> C </clause> <k> T:Term ...</k> requires #unifiable(T,head(C)) syntax Term ::= head(Clause) [function] rule head(L.) => L rule head(L:-_.) => L
If the goal is unifiable with the fresh clause's head, replace the goal
with the clause body, and empty the fresh
cell (so that
another clause can be chosen using the rule above):
rule <k> L:Term => . ...</k> <fresh> L:Term . => . </fresh> rule <k> L:Term :KItem => Ls ...</k> <fresh> L:Term :- Ls:Terms. => . </fresh>
Note that there is no problem if a clause is chosen whose
conclusion literal does not unify with the first goal literal.
The search
option of krun
will systematically try all clauses, so no
solution is missed. Of course, the above is not the most efficient
way to implement a logic programming language, but recall that our
objective here was to present a simple and mathematically clean
solution. We encourage the interested reader to consult the LOGIK++
language definition for a more efficient definition of a richer logic
programming language.
endmodule
A list of projects using the K framework. If you are working on something interesting, and you want to share it with the community,
let us know on our socials, and we will feature you on this list.
The Algorand Virtual Machine and TEAL Semantics in K
KAVM leverages the K Framework to empower Algorand smart contracts' developers
with property-based testing and formal verification.
The K Semantics of Plutus-Core
This project aims to translate real K semantics into Dedukti.
KWasm is the K semantics of WebAssembly.
WebAssembly is a low-level (but simple and streamlined) assembly language that was originally developed to provide a fast execution engine for browser-based tools.
More recently, it has been used in several blockchain smart-contract platforms as the underlying language for executing financial agreements.
KWasm has been used for measuring coverage of test-suites over Wasm code and verifying programs which are compiled to Wasm.
KEVM is the K semantics of the Ethereum Virtual Machine.
It passes all the Ethereum Test Suite, and is used for verifying EVM programs.
IELE is the underlying VM integrated into the Cardano blockchain.
IELE is a register-based VM (inspired by LLVM), which attempts to avoid many of the missteps in design present in EVM.
K-Michelson (Oct 2019 - Present)
K-Michelson is the K semantics of Michelson blockchain programming language, which powers the Tezos blockchain.
KMichelson provides additional testing tools for developers, including a unit-testing framework which is extendable to symbolic property testing.
The K semantics of the C programming language specifies the translation, linking, and execution semantics of the C language according to the official C standard.
It has been used to build tools like RV-Match, which detects undefined behaviors in users programs by running their test-suites through the C semantics.