A Naive Substitution-Based Type Inferencer

In this lesson you learn how to define a naive substitution-based type inferencer for a higher-order language, namely the LAMBDA language defined in Part 1 of the tutorial.

Unlike in the type checker defined in Lessons 2 and 3, where we had to associate a type with each declared variable, a type inferencer attempts to infer the types of all the variables from the way those variables are used. Let us take a look at this program, say plus.lambda:

lambda x . lambda y . x + y

Since x and y are used in an integer addition context, we can infer that they must have the type int and the result of the addition is also an int, so the type of the entire expression is int -> int -> int. Similarly, the program if.lambda

lambda x . lambda y . lambda z .
  if x then y else z

can only make sense when x has type bool and y and z have the same type, say t, in which case the type of the entire expression is bool -> t -> t -> t. Since the type t can be anything, we say that the type of this expression is polymorphic. That means that the code above can be used in different contexts, where t can be an int, a bool, a function type int -> int, and so on.

In the identity.lambda program

let f = lambda x . x
in f 1

f has such a polymorphic type, which is then applied to an integer, so this program is type-safe and its type is int.

A typical polymorphic expression is the composition

lambda f . lambda g . lambda x .
  g (f x)

which has the type (t1 -> t2) -> (t2 -> t3) -> (t1 -> t3), polymorphic in 3 types.

Let us now define our naive type inferencer and then we discuss more examples. The idea is quite simple: we conceptually do the same operations like we did within the type checker defined in Lesson 2, with two important differences:

instead of declaring a type with each declared variable, we assume a fresh type for that variable; and
instead of checking that the types of expressions satisfy the type properties of the context in which they are used, we impose those properties as type equality constraints. A general-purpose unification-based constraint solving mechanism is then used to solve the generated type constraints.

Let us start with the syntax, which is essentially identical to that of the type checker in Lesson 2, except that bound variables are not declared a type anymore. Also, to keep things more compact, we put all the Exp syntax declarations in one syntax declaration this time.

Before we modify the rules, let us first define our machinery for adding and solving constraints. First, we require and import the unification procedure. We do not discuss unification here, but if you are interested you can consult the unification.k files under k-distribution/include/kframework/builtin, which contains our current generic definition of unification, which is written also in K. The generic unification provides a sort, Mgu, for most-general-unifier, an operation updateMgu(Mgu,T1,T2) which updates Mgu with additional constraints generated by forcing the terms T1 and T2 to be equal, and an operation applyMgu(Mgu,T) which applies Mgu to term T. For our use of unification here, we do not even need to know how Mgu terms are represented internally.

We define a K item construct, =, which takes two Type terms and enforces them to be equal by means of updating the current Mgu. Once the constraints are added to the Mgu, the equality dissolves itself. With this semantics of = in mind, we can now go ahead and modify the rules of the type checker systematically into rules for a type inferencer. The changes are self-explanatory and mechanical: for example, the rule

rule int * int => int

changes into rule

rule T1:Type  * T2:Type => T1 = int ~> T2 = int ~> int

generating the constraints that the two arguments of multiplication have the type int, and the result type is int. Recall that each type equality on the <k/> cell updates the current Mgu appropriately and then dissolves itself; thus, the above says that after imposing the constraints T1=int and T2=int, multiplication yields a type int.

As mentioned above, since types of variables are not declared anymore, but inferred, we have to generate a fresh type for each variable at its declaration time, and then generate appropriately constraints for it. For example, the type semantics of lambda and mu become:

rule lambda X . E => T -> E[T/X]  when fresh(T:Type)
rule mu X . E => (T -> T) E[T/X]  when fresh(T:Type)

that is, we add a condition stating that the previously declared type is now a fresh one. This type will be further constrained by how the variable X is being used within E.

Interestingly, the previous typing rule for lambda application is not powerful enough anymore. Indeed, since types are not given anymore, it may very well be the case that the inferred type of the first argument of the application construct is not yet a function type (remember, for example, the program composition.lambda above). What we have to do is to enforce it to be a function type, by means of fresh types and constraints. We can introduce a fresh type for the result of the application, and then write the expected rule as follows:

rule T1:Type T2:Type => T1 = (T2 -> T) ~> T  when fresh(T:Type)

The conditional requires that its first argument is a bool and its second and third arguments have the same type, which is also the result type.

The macros do not change, in particular let is desugared into lambda application. We will next see that this is a significant restriction, because it limits the polymorphism of our type system.

We are done. We have a working type inferencer for LAMBDA.

Let's kompile it and krun the programs above. They all work as expected. Let us also try some additional programs, to push it to its limits.

First, let us test mu by means of a letrec example:

letrec f x = 3
in f

We can also try all the programs that we had in our first tutorial, on lambda, for example the factorial.imp program:

letrec f x = if x <= 1 then 1 else (x * (f (x + -1)))
in (f 10)

Those programs are simple enough that they should all work as expected with our naive type inferencer here.

Let us next try to type some tricky programs, which involve more complex and indirect type constraints.

tricky-1.lambda:

lambda f . lambda x . lambda y . (
  (f x y) + x + (let x = y in x)
)

tricky-2.lambda:

lambda x .
  let f = lambda y . if true then y else x
  in (lambda x . f 0)

tricky-3.lambda:

lambda x . let f = lambda y . if true then x 7 else x y
           in f

tricky-4.lambda:

lambda x . let f = lambda x . x
           in let d = (f x) + 1
              in x

tricky-5.lambda:

lambda x . let f = lambda y . x y
           in let z = x 0 in f

It is now time to see the limitations of this naive type inferencer. Consider the program

let id = lambda x . x
in if (id true) then (id 1) else (id 2)

Our type inferencer fails graciously with a clash in the <mgu/> cell between int and bool. Indeed, the desugaring macro of let turns it into a lambda and an application, which further enforce id to have a type of the form t -> t for some fresh type t. The first use of id in the condition of if will then constrain t to be bool, while the other uses in the two branches will enforce t to be int. Thus the clash in the <mgu/> cell.

Similarly, the program

let id = lambda x . x
in id id

yields a different kind of conflict: if id has type t -> t, in order to apply id to itself it must be the case that its argument, t, equals t -> t. These two type terms cannot be unified because there is a circular dependence on t, so we get a cycle in the <mgu/> cell.

Both limitations above will be solved when we change the semantics of let later on, to account for the desired polymorphism.

Before we conclude this lesson, let us see one more interesting example, where the lack of let-polymorphism leads not to a type error, but to a less generic type:

let f1 = lambda x . x in
  let f2 = f1 in
    let f3 = f2 in
      let f4 = f3 in
        let f5 = f4 in
          if (f5 true) then f2 else f3

Our current type inferencer will infer the type bool -> bool for the program above. Nevertheless, since all functions f1, f2, f3, f4, f5 are the identity function, which is polymorphic, we would expect the entire program to type to the same polymorphic identity function type.

This limitation will be also addressed when we define our let-polymorphic type inferencer.

Before that, in the next lesson we will show how easily we can turn the naive substitution-based type inferencer discussed in this lesson into a similarly naive, but environment-based type inferencer.

Go to Lesson 5, Type Systems: A Naive Environment-Based Type Inferencer.