Let-Polymorphic Type Inferencer (Damas-Hindley-Milner)
In this lesson we discuss a type inferencer based on what we call today the Damas-Hindley-Milner type system, which is at the core of many modern functional programming languages. The first variant of it was proposed by Hindley in 1969, then, interestingly, Milner rediscovered it in 1978 in the context of the ML language. Damas formalized it as a type system in his PhD thesis in 1985. More specifically, our type inferencer here, like many others as well as many implementations of it, follows more closely the syntax-driven variant proposed by Clement in 1987.
In terms of K, we will see how easily we can turn one definition which is considered naive (our previous type inferencer in Lesson 8) into a definition which is considered advanced. All we have to do is to change one existing rule (the rule of the let binder) and to add a new one. We will also learn some new predefined features of K, which make the above possible.
The main idea is to replace the rule
rule let X = E in E' => E'[E/X]
which creates potentially many copies of E
within E'
with a rule
which types E
once and then reuses that type in each place where X
occurs free in E'
. The simplest K way to type E
is to declare the
let construct strict(2)
. Now we cannot simply bind X
to the type
of E
, because we would obtain a variant of the naive type inferencer
we already discussed, together with its limitations, in Lesson 5 of this
tutorial. The trick here is to parameterize the type of E
in all its
unconstrained fresh types, and then create fresh copies of those
parameters in each free occurrence of X
in E'
.
Let us discuss some examples, before we go into the technical details. Consider the first let-polymorphic example which failed to be typed with our first naive type-inferencer:
let id = lambda x . x
in if (id true) then (id 1) else (id 2)
When typing lambda x . x
, we get a type of the form t -> t
, for some
fresh type t
. Instead of assigning this type to id
as we did in the
naive type inferencers, we now first parametrize this type in its
fresh variable t
, written
(forall t) t -> t
and then bind id
to this parametric type. The intuition for the
parameter is that it can be instantiated with any other type, so this
parametric type stands, in fact, for infinitely many non-parametric
types. This is similar to what happens in formal logic proof systems,
where rule schemas stand for infinitely many concrete instances of
them. For this reason, parametric types are also called type schemas.
Now each time id
is looked up within the let-body, we create a fresh
copy of the parameter t
, which can this way be independently
constrained by each local context. Let's suppose that the three id
lookups yield the types t1 -> t1
, t2 -> t2
, and respectively t3 -> t3
.
Then t1
will be constrained to be bool
, and t2
and t3
to be int
,
so we can now safely type the program above to int
.
Therefore, a type schema comprises a summary of all the typing work that has been done for typing the corresponding expression, and an instantiation of its parameters with fresh copies represents an elegant way to reuse all that typing work.
There are some subtleties regarding what fresh types can be made parameters. Let us consider another example, discussed as part of Lesson 7 on naive let-polymorphism:
lambda x . (
let y = lambda z . x
in if (y true) then (y 1) else (y (lambda x . x))
)
This program should type to bool -> bool
, as explained in Lesson 7.
The lambda
construct will bind x
to some fresh type tx
. Then the
let-bound expression lambda z . x
types to tz -> tx
for some
additional fresh type tz
. The question now is what should the
parameters of this type be when we generate the type schema? If we
naively parameterize in all fresh variables, that is in both tz
and
tx
obtaining the type schema (forall tz,tx) tz -> tx
, then there will
be no way to infer that the type of x
, tx
, must be a bool
! The
inferred type of this expression would then wrongly be tx -> t
for
some fresh types tx
and t
. That's because the parameters are replaced
with fresh copies in each occurrence of y
, and thus their relationship
to the original x
is completely lost. This tells us that we cannot
parameterize in all fresh types that appear in the type of the
let-bound expression. In particular, we cannot parameterize in those
which some variables are already bound to in the current type
environment (like x
is bound to tx
in our example above).
In our example, the correct type schema is (forall tz) tz -> tx
,
which now allows us to correctly infer that tx
is bool
.
Let us now discuss another example, which should fail to type:
lambda x .
let f = lambda y . x y
in if (f true) then (f 1) else (f 2)
This should fail to type because lambda y . x y
is equivalent to x
,
so the conditional imposes the conflicting constraints that x
should be
a function whose argument is either a bool
or an int
. Let us try to
type it using our currently informal procedure. Like in the previous
example, x
will be bound to a fresh type tx
. Then the let-bound
expression types to ty -> tz
with ty
and tz
fresh types, adding also
the constraint tx = ty -> tz
. What should the parameters of this type
be? If we ignore the type constraint and simply make both ty
and tz
parameters because no variable is bound to them in the type
environment (indeed, the only variable x
in the type environment is
bound to tx
), then we can wrongly type this program to tx -> tz
following a reasoning similar to the one in the example above.
In fact, in this example, none of ty
and tz
can be parameters, because
they are constrained by tx
.
The examples above tell us two things: first, that we have to take the type constraints into account when deciding the parameters of the schema; second, that after applying the most-general-unifier solution given by the type constraints everywhere, the remaining fresh types appearing anywhere in the type environment are consequently constrained and cannot be turned into parameters. Since the type environment can in fact also hold type schemas, which already bind some types, we only need to ensure that none of the fresh types appearing free anywhere in the type environment are turned into parameters of type schemas.
Thanks to generic support offered by the K tool, we can easily achieve all the above as follows.
First, add syntax for type schemas:
syntax TypeSchema ::= "(" "forall" Set ")" Type [binder]
The definition below will be given in such a way that the Set
argument
of a type schema will always be a set of fresh types. We also declare
this construct to be a binder
, so that we can make use of the generic
free variable function provided by the K tool.
We now replace the old rule for let
rule let X = E in E' => E'[E/X]
with the following rule:
rule <k> let X = T:Type in E => E ~> tenv(TEnv) ...</k>
<mgu> Theta:Mgu </mgu>
<tenv> TEnv
=> TEnv[(forall freeVariables(applyMgu(Theta, T)) -Set
freeVariables(applyMgu(Theta, values TEnv))
) applyMgu(Theta, T) / X]
</tenv>
So the type T
of E
is being parameterized and then bound to X
in the
type environment. The current mgu Theta
, which comprises all the type
constraints accumulated so far, is applied to both T
and the types in
the type environment. The remaining fresh types in T
which do not
appear free in the type environment are then turned into type parameters.
The function freeVariables
returns, as expected, the free variables of
its argument as a Set
; this is why we declared the type schema to be a
binder above.
Now a LAMBDA variable in the type environment can be bound to either a
type or a type schema. In the first case, the previous rule we had
for variable lookup can be reused, but we have to make sure we check
that T
there is of sort Type
(adding a sort membership, for example).
In the second case, as explained above, we have to create fresh copies
of the parameters. This can be easily achieved with another
predefined K function, as follows:
rule <k> X:Id => freshVariables(Tvs,T) ...</k>
<tenv>... X |-> (forall Tvs) T ...</tenv>
Indeed, freshVariables
takes a set of variables and a term, and returns the
same term but with each of the given variables replaced by a fresh copy.
The operations freeVariables
and freshVariables
are useful in many K
definitions, so they are predefined in module substitution.k
.
Our definition of this let-polymorphic type inferencer is now
complete. To test it, kompile
it and then krun
all the LAMBDA
programs discussed since Lesson 4. They should all work as expected.