A Naive Substitution-Based Type Inferencer
In this lesson you learn how to define a naive substitution-based type inferencer for a higher-order language, namely the LAMBDA language defined in Part 1 of the tutorial.
Unlike in the type checker defined in Lessons 2 and 3, where we had to
associate a type with each declared variable, a type inferencer
attempts to infer the types of all the variables from the way those
variables are used. Let us take a look at this program, say plus.lambda
:
lambda x . lambda y . x + y
Since x
and y
are used in an integer addition context, we can infer
that they must have the type int
and the result of the addition is
also an int
, so the type of the entire expression is int -> int -> int
.
Similarly, the program if.lambda
lambda x . lambda y . lambda z .
if x then y else z
can only make sense when x
has type bool
and y
and z
have the same
type, say t
, in which case the type of the entire expression is
bool -> t -> t -> t
. Since the type t
can be anything, we say that
the type of this expression is polymorphic. That means that the code
above can be used in different contexts, where t
can be an int
, a
bool
, a function type int -> int
, and so on.
In the identity.lambda
program
let f = lambda x . x
in f 1
f
has such a polymorphic type, which is then applied to an integer,
so this program is type-safe and its type is int
.
A typical polymorphic expression is the composition
lambda f . lambda g . lambda x .
g (f x)
which has the type (t1 -> t2) -> (t2 -> t3) -> (t1 -> t3)
, polymorphic
in 3 types.
Let us now define our naive type inferencer and then we discuss more examples. The idea is quite simple: we conceptually do the same operations like we did within the type checker defined in Lesson 2, with two important differences:
- instead of declaring a type with each declared variable, we assume a fresh type for that variable; and
- instead of checking that the types of expressions satisfy the type properties of the context in which they are used, we impose those properties as type equality constraints. A general-purpose unification-based constraint solving mechanism is then used to solve the generated type constraints.
Let us start with the syntax, which is essentially identical to that
of the type checker in Lesson 2, except that bound variables are not
declared a type anymore. Also, to keep things more compact, we put
all the Exp
syntax declarations in one syntax declaration this time.
Before we modify the rules, let us first define our machinery for
adding and solving constraints. First, we require and import the
unification procedure. We do not discuss unification here, but if you
are interested you can consult the unification.k
files under
k-distribution/include/kframework/builtin, which contains our current generic
definition of unification, which is written also in K. The generic unification
provides a sort, Mgu
, for most-general-unifier, an operation
updateMgu(Mgu,T1,T2)
which updates Mgu
with additional constraints
generated by forcing the terms T1
and T2
to be equal, and an operation
applyMgu(Mgu,T)
which applies Mgu
to term T
. For our use
of unification here, we do not even need to know how Mgu
terms are
represented internally.
We define a K item construct, =
, which takes two Type
terms and
enforces them to be equal by means of updating the current Mgu
.
Once the constraints are added to the Mgu
, the equality dissolves
itself. With this semantics of =
in mind, we can now go ahead and
modify the rules of the type checker systematically into rules
for a type inferencer. The changes are self-explanatory and
mechanical: for example, the rule
rule int * int => int
changes into rule
rule T1:Type * T2:Type => T1 = int ~> T2 = int ~> int
generating the constraints that the two arguments of multiplication
have the type int
, and the result type is int
. Recall that each type
equality on the <k/>
cell updates the current Mgu
appropriately and
then dissolves itself; thus, the above says that after imposing the
constraints T1=int
and T2=int
, multiplication yields a type int
.
As mentioned above, since types of variables are not declared anymore,
but inferred, we have to generate a fresh type for each variable at its
declaration time, and then generate appropriately constraints for it.
For example, the type semantics of lambda
and mu
become:
rule lambda X . E => T -> E[T/X] when fresh(T:Type)
rule mu X . E => (T -> T) E[T/X] when fresh(T:Type)
that is, we add a condition stating that the previously declared type
is now a fresh one. This type will be further constrained by how the
variable X
is being used within E
.
Interestingly, the previous typing rule for lambda application is not powerful enough anymore. Indeed, since types are not given anymore, it may very well be the case that the inferred type of the first argument of the application construct is not yet a function type (remember, for example, the program composition.lambda above). What we have to do is to enforce it to be a function type, by means of fresh types and constraints. We can introduce a fresh type for the result of the application, and then write the expected rule as follows:
rule T1:Type T2:Type => T1 = (T2 -> T) ~> T when fresh(T:Type)
The conditional requires that its first argument is a bool
and its
second and third arguments have the same type, which is also the
result type.
The macros do not change, in particular let
is desugared into lambda
application. We will next see that this is a significant restriction,
because it limits the polymorphism of our type system.
We are done. We have a working type inferencer for LAMBDA.
Let's kompile
it and krun
the programs above. They all work as
expected. Let us also try some additional programs, to push it to its
limits.
First, let us test mu
by means of a letrec
example:
letrec f x = 3
in f
We can also try all the programs that we had in our first tutorial, on
lambda, for example the factorial.imp
program:
letrec f x = if x <= 1 then 1 else (x * (f (x + -1)))
in (f 10)
Those programs are simple enough that they should all work as expected with our naive type inferencer here.
Let us next try to type some tricky programs, which involve more complex and indirect type constraints.
tricky-1.lambda
:
lambda f . lambda x . lambda y . (
(f x y) + x + (let x = y in x)
)
tricky-2.lambda
:
lambda x .
let f = lambda y . if true then y else x
in (lambda x . f 0)
tricky-3.lambda
:
lambda x . let f = lambda y . if true then x 7 else x y
in f
tricky-4.lambda
:
lambda x . let f = lambda x . x
in let d = (f x) + 1
in x
tricky-5.lambda
:
lambda x . let f = lambda y . x y
in let z = x 0 in f
It is now time to see the limitations of this naive type inferencer. Consider the program
let id = lambda x . x
in if (id true) then (id 1) else (id 2)
Our type inferencer fails graciously with a clash in the <mgu/>
cell
between int
and bool
. Indeed, the desugaring macro of let
turns it
into a lambda
and an application, which further enforce id
to have a
type of the form t -> t
for some fresh type t
. The first use of id
in the condition of if
will then constrain t
to be bool
, while the
other uses in the two branches will enforce t
to be int
. Thus the
clash in the <mgu/>
cell.
Similarly, the program
let id = lambda x . x
in id id
yields a different kind of conflict: if id
has type t -> t
, in order
to apply id
to itself it must be the case that its argument, t
, equals
t -> t
. These two type terms cannot be unified because there is a
circular dependence on t
, so we get a cycle in the <mgu/>
cell.
Both limitations above will be solved when we change the semantics of
let
later on, to account for the desired polymorphism.
Before we conclude this lesson, let us see one more interesting example, where the lack of let-polymorphism leads not to a type error, but to a less generic type:
let f1 = lambda x . x in
let f2 = f1 in
let f3 = f2 in
let f4 = f3 in
let f5 = f4 in
if (f5 true) then f2 else f3
Our current type inferencer will infer the type bool -> bool
for the
program above. Nevertheless, since all functions f1
, f2
, f3
, f4
, f5
are the identity function, which is polymorphic, we would expect the
entire program to type to the same polymorphic identity function type.
This limitation will be also addressed when we define our let-polymorphic type inferencer.
Before that, in the next lesson we will show how easily we can turn the naive substitution-based type inferencer discussed in this lesson into a similarly naive, but environment-based type inferencer.
Go to Lesson 5, Type Systems: A Naive Environment-Based Type Inferencer.