Immutable Code in Unison

Surprisingly good things happen if your language doesn’t let you change the code you write.

Published

February 26, 2023

I’m going to show you some magic, and then reveal how it is done.

To do that, I’m going to use the Unison programming language, which enforces both immutable data and immutable code.1

This is a long post: I promise it’s worth reading it all. By the end, you’ll see why immutable code has the potential to fix both dependency hell and software distribution: no more package managers, no more builders/bundlers, and no more complex deployment.

Before we start, there are two things you need to know:

  1. Unison function definitions are like Haskell’s. Here’s a function that multiplies two natural numbers:2

    Unison
    times: Nat -> Nat -> Nat
    times a b = a * b

    The first line is the type signature: times takes a natural number (Nat), then another Nat, and returns a Nat.

    The second line is the definition: the function has two parameters, a and b, shown to the left of the equals sign, and a body, to the right. The value of the function when called is the value of the body.

  2. Unison doesn’t use (long lived) source files. Instead, it manages all your code itself inside a namespaced repository.3 Your interface to this repository is the ucm4 command.

    In normal use, you cd into some temporary, empty, directory and run the ucm command. In another window, you use your favorite editor to create Unison source files in that same directory. When you save a file, ucm compiles its contents and shows you the functions it found. It also runs any tests you’ve defined. If you’re happy, you can add the function to (or update the function in) the Unison code repository.

In the magic show that follows, you’ll see ucm sessions along with Unison source code that I entered into my editor. I’ll label these blocks with UCM or Unison for clarity.

Abracadabra

We start in ucm, where we create a namespace and import the standard library into it.

UCM
.> cd magic
.magic> fork .base lib.base
    Done.

Now, in the editor, we’ll create a function that adds its arguments.

Unison
add: Nat -> Nat -> Nat
add a b = a + b

test> add.test = check (add 3 4 == 7)

The last line defines a test (which becomes its own function). We bind it to the name add.test. This isn’t necessary, but it makes it easier to see what’s going on.

We save the source, and the ucm window bursts into life:

UCM
    ⍟ These new definitions are ok to `add`:
    
      add : Nat -> Nat -> Nat
      add.tests : [Result]

  Now evaluating any watch expressions (lines starting with `>`)...

    4 | test> add.tests = check (sum 3 4 == 7)
    
    ✅ Passed : Proved.

Ucm now has local copies of our add and add.tests functions.

We’ll add these two functions to the repository:

UCM
.magic> add 
  ⍟ I've added these definitions:
    add : Nat -> Nat -> Nat
    add.tests: : [Result]

Back in our editor we’ll define two new functions, square and sumSquare, along with a couple of tests.

Unison
square: Nat -> Nat
square a = a * a

test> square.tests = check (square 3 == 9)

sumSquare : Nat -> Nat -> Nat
sumSquare a b = add (square a) (square b)

test> sumSquare.tests = check (sumSquare 3 4 == 25)

Save it, and ucm reports:

UCM
   ⍟ These new definitions are ok to `add`:
    
      square          : Nat -> Nat
      square.tests    : [Result]
      sumSquare       : Nat -> Nat -> Nat
      sumSquare.tests : [Result]
  
    4 | test> square.tests = check (square 3 == 9)
    ✅ Passed : Proved. (cached)
  
    9 | test> sumSquare.tests = check (sumSquare 3 4 == 25)
    ✅ Passed : Proved. (cached)

Add square and sumSquare to the repository (along with their tests).

UCM
.magic> add

  ⍟ I've added these definitions:
  
    square : Nat -> Nat
    sumSquare  : Nat -> Nat -> Nat

At this point, feel free to exit your editor and delete the scratch source file. Honest.

Back in ucm, we can prove that our source is still safely stored, and that the tests still pass:

UCM
.magic> view sumSquare

  sumSquare : Nat -> Nat -> Nat
  sumSquare a b = add (square a) (square b)

.magic> test

  ◉ square.tests     : Proved.
  ◉ add.tests        : Proved.
  ◉ sumSquare.tests : Proved.
  
  ✅ 3 test(s) passing

It looks like our code is safely tucked away inside Unison.

Part Two: What’s in a Name?

Looking at the test results from the previous ucm output, I’m struck by the fact we called our addition function add and the function that adds two squares sumSquare. Fortunately, we can use ucm to rename it.

UCM
.magic> move.term add sum
   Done.

Let’s run the tests again:

UCM
  ◉ square.tests    : Proved.
  ◉ add.tests       : Proved.
  ◉ sumSquare.tests : Proved.
  
  ✅ 3 test(s) passing

Look at that. We renamed the add function, but not its test. We’ll fix that:

UCM
.magic> move.term add.tests sum.tests
   Done.
.magic> test
  ◉ square.tests    : Proved.
  ◉ sum.tests       : Proved.
  ◉ sumSquare.tests : Proved.
  
  ✅ 3 test(s) passing

Cool. Except… it shouldn’t have worked. The sumSquare function was written to use add, but add no longer exists. Let’s double check the source:

UCM
.magic> view sumSquare

  sumSquare : Nat -> Nat -> Nat
  sumSquare a b = sum (square a) (square b)

We never touched the source of sumSquare, but somehow the call to add was replaced by a call to sum. That’s why the tests ran.

OK, you’re thinking. When we renamed add, ucm went through the repository and changed the word add to sum.

The truth is far cooler than that.

Part Three: Bring out the chainsaw

We now have our code safely stored in a box, the Unison repository. Time for finale of our trick: let’s do some damage and saw it in half.

Open up your editor and create a new definition for the function sum. This version takes three parameters, not two.

Unison
sum: Nat -> Nat -> Nat -> Nat
sum a b c = a + b + c

test> sum.tests = check (sum 3 4 5 == 12)

ucm, being the honey badger of development environments, just doesn’t care:

UCM
   ⍟ These names already exist. You can `update` them to your new
      definition:
    
      sum       : Nat -> Nat -> Nat -> Nat
      sum.tests : [Result]
  
    6 | test> sum.tests = check (sum 3 4 5 == 12)
    
    ✅ Passed : Proved. (cached)

Let’s again do as it suggests and update the sum function, then rerun all the tests:

UCM
.magic> update

  ⍟ I've updated these names to your new definition:
  
    sum       : Nat -> Nat -> Nat -> Nat
    sum.tests : [Result]

.magic> test

  Cached test results (`help testcache` to learn more)
  
  ◉ square.tests       : Proved.
  ◉ sum.tests          : Proved.
  ◉ sumSquare.tests    : Proved.

So that’s just weird. We replaced the sum function with one that is incompatible with the original, and yet sumSquare still works.

Let’s have a look at sumSquare one more time:

UCM
.magic> view sumSquare

  sumSquare : Nat -> Nat -> Nat
  sumSquare a b = #aut6jgfc1j (square a) (square b)

Whoa: there’s a strange set of characters, #aut6jgfc1j, where the function sum used to be. If we assume it’s a function name, perhaps we can view it:

UCM
.magic> view #aut6jgfc1j

  #aut6jgfc1j : Nat -> Nat -> Nat
  #aut6jgfc1j a b =
    use Nat +
    a + b

That’s our original sum function, but with a new name.

Part 4: Finale

Some time later, after I’d forgotten about the whole sum/add fiasco, I’m back writing code. I needed a function to total some numbers, so I wrote:

Unison
total : Nat -> Nat -> Nat
total addend1 addend2 = 
  addend1 + addend2

test> total.tests = check (total 5 6 == 11)

I saved it, and over in ucm I added it to the repository and ran tests.

UCM
   ⍟ These new definitions are ok to `add`:
    
      total       : Nat -> Nat -> Nat
      total.tests : [Result]
  
    4 | test> total.tests = check (total 5 6 == 11)
    ✅ Passed : Proved.

.magic> add
  ⍟ I've added these definitions:
  
    total       : Nat -> Nat -> Nat
    total.tests : [Result]

At that time I remembered about the strange function name in sumSquare. I decided to have another look at it:

UCM
.magic> view sumSquare
  sumSquare : Nat -> Nat -> Nat
  sumSquare a b = total (square a) (square b)

ucm noticed that the total function does exactly what the old sum function did, even though the name and parameter names are different. This means it can now use the more readable name, total in place of #aut6jgfc1j.

Magic!

What we saw

  • Unison manages your source code internally: what you see in your editor is ephemeral.

  • The names of functions can be changed (on way is to use move.term), and that change is reflected in all sites that reference that function.

  • Replacing a function with a new, incompatible, version doesn’t break existing uses of that function. Instead, those existing uses are replaced with a strange name staring with #.

  • If you subsequently define a function that does the same thing as the original, Unison replaces the #... name with that new function’s name.

Next we’ll see how all this is done. Now might be a good time for a stretch…

How It Works

We’ve all heard of immutable data. It’s one of the cornerstones of functional programming. Immutability makes it easier to reason about your code, easier to reuse your functions, and easier to write concurrent functions.

Immutable data sounds like a good idea.

Well, it turns out that having immutable code is also amazingly beneficial: it’s how Unison pulls off all the tricks with our add function.

Let’s investigate this from the bottom up.

What is a Function?

Here’s the definition of a function that sums its arguments, written in several languages:

sum a b = a + b
const sum = (a, b) => a + b
sum = -> (a, b) { a + b }
sum = lambda a, b : a + b

If you’re like me,5 you’ll interpret this code as create a function called “sum” that adds its arguments. But that’s not really true. All of these code fragments do two things: they create a function that sums its arguments, and then they associate that function with a variable or constant called sum. The function and the name are separate.

If you’re not convinced, have a look at this JavaScript fragment.

let sum = (a, b) => a + b
let add = sum
sum = "Hello"
console.log(add(3, 4))   // => 7
console.log(sum(3, 4))   // error: `sum` is not a function

On line 2 we copy the function value into the variable add, and on line 3 we reassign sum.

The function is independent of the name it’s bound to.

The Names of a Function

But I tell you, a cat needs a name that’s particular,
     A name that’s peculiar, and more dignified,
Else how can he keep up his tail perpendicular,
     Or spread out his whiskers, or cherish his pride?
Of names of this kind, I can give you a quorum,
     Such as Munkustrap, Quaxo, or Coricopat,
Such as Bombalurina, or else Jellylorum—
     Names that never belong to more than one cat.

T.S.Eliot, The Naming of Cats

T. S. Eliot tells us cats have three names: the name we call them, the unique name they call each other, and a secret name known only to its owner.

Functions are like cats in a couple of ways: they often ignore what we tell them to do, and they have their own secret name as well as the names that we call them.

For functions, that secret name is their implementation; their code.

How can we turn an implementation into some kind of name? One way is to use the abstract syntax tree (AST) which the compiler uses to represent the function.

Possible AST of sum a b = a + b. The shaded cells are the function implementation.

The AST of the function implementation is just a data structure. We can generate a hash value from it, and use that hash as the internal (or secret) name of this particular function.

In Unison, this hash value is 512 bits long, making the chances that two different functions will have the same signature effectively zero.

Initially Defining a Function

When we defined the add function, Unison computed the hash of its AST, and associated the hash with the AST representation. Once added to the repository, these two things will never change: the function body and the hash that references it are both immutable.

At the same time, Unison creates an alias to the hash. In this case, that alias is add, the name we first bound to the function.

We then used move.term in ucm. Although it looked like we were renaming the function, all we were really doing was replacing an alias to that function:

Even if we deleted all aliases, the function would still be there, and can be referenced using its secret hash name.

Using That Function

We then defined two more functions, square and sumSquare. The sumSquare function introduces no new concepts: an AST is created, a hash is used to name it, and the alias square references that hash.

But sumSquare lets us explore the second trick in our magic show: how did Unison update the call to add to a call to sum when we renamed it?

It turns out there’s nothing really to do. When creating the AST, Unison resolves the names of functions that are called via their aliases to their underlying hash.

Let’s repeat that, because it is the foundation from which all the benefits of immutability grow. Functions are always called by their internal hash, and not their local name.

Here’s part of the AST for sumSquare a b = add (square a) (square b):

See how the name in the call node is the hash for the add function.

Because the hash corresponds to the implementation of the function, what we’re effectively doing is calling the implementation of the function which just happens to have a particular name (add, initially).

Viewing a function

UCM stores terms in the repository as their AST, not as their source. Whenever we fetch that term to display or edit it, ucm converts that AST back into plain text.6 As part of that process, it looks up the secret names of functions in the aliases list. If it finds an alias, it substitutes it for the hash name. That’s why when we said view sumSquare, we initially saw it using add as the name of the function it called. However, when we replaced add with sum in the alias table, Unison would do the same lookup and come back with the new name: miraculously the function is now called sum. (Of course, all the time it’s actually called #aut6jgfc1j)

Changing a Function

Then we changed the definition of sum to have three arguments.

When ucm compiles this, the AST will be different to that of the original add or sum; its hash will be different.7 This means that when it gets stored, it will be under the new hash, and the alias sum will point to that rather than the original.

But nothing hash changed in the implementation of sumSquare: that function call still references the hash #aut6jgfc1j, which refers to the original implementation. When we call sumSquare it will still call our original two-argument version of sum.

However, if we view sumSquare, it won’t be able to find an alias for that hash when generating the text version of the function. That’s why you see the hash instead.

UCM
.magic> view sumSquare

  sumSquare : Nat -> Nat -> Nat
  sumSquare a b = #aut6jgfc1j (square a) (square b)

That’s actually valid Unison syntax: you can use the (not-so-)secret hash in place of the function name.

Discovering a New Name

Finally we created a total function:

Unison
total : Nat -> Nat -> Nat
total addend1 addend2 = 
  addend1 + addend2

Compare it to our original add function:

Unison
add: Nat -> Nat -> Nat
add a b = a + b

Because Unison doesn’t care about the names of parameters or the textual layout of the function when creating the AST hash, it turns out that the total function has the identical hash to the add function #aut6jgfc1j.

At the time we created total, #aut6jgfc1j was pointing to our original code, and there were no aliases referencing it (because sum had moved). So Unison says “this function already exists in the repository, so I don’t need to store it again. All I have to do is add the alias total to the existing hash.”

This means that the next time we fetch the source for sumSquare it can resolve the function call to the name total.

Our add two natural numbers function was compiled, hashed, and stored in the repository at the very beginning of this episode. Since then, it hasn’t changed—it can’t change. But, locally, we’ve referred to it using three aliases as well as by its internal hash, we changed its name and its implementation, and all the while the rest of our code just kept running.

You’re writing an application that uses external libraries.

Imagine that you didn’t have to use a package manager or edit a TOML or JSON file to add it. Instead you just called it, and the then current version of it is associated with your code. You don’t need a local copy; ucm can handle all that behind the scenes. Maybe it does caching, but who cares?

Imagine coming back to that code two years later and expecting it to just run. Why wouldn’t it? Nothing has changed.

Imagine being able to change a function that you or someone else wrote and not having to worry about breaking code that depends on that function.

Imagine being able to distribute code by just giving the target machine the hash of your main function.

Imagine immutable code.

There’s Always More

This article is already twice as long as I’d hoped, so I’ve had to leave out a lot. In particular, ucm basically has a git-like diff/merge/patch system built in, along with the ability to traverse dependencies. This makes it easy to cherry-pick updates when new versions of functions you use are produced. It also allows you to generated a patch file that others can apply to their local namespace.

Obviously, there’s also a whole lot to be said about Unison the language, but in these articles I wanted to focus on specific and unique features that I find exciting.

The next article in this series will talk about Unison abilities. These let you isolate mutable state and scope access to it via the call-chain; not syntactically.

After that I want to talk about the way Unison abilities made me rethink how distributed code could work.

Footnotes

  1. Unison is a crazily innovative language. As well as this articles, I’ll be writing about its effect system and about its super-easy distributed computing model. But this isn’t a Unison tutorial (let me know if you want one). Instead, I’m just using it to illustrate some points↩︎

  2. The term natural numbers here means positive integers along with zero.↩︎

  3. If you’ve come across Smalltalk, this is quite similar to its idea of an image.↩︎

  4. Unison Code Manager↩︎

  5. Or, at least, me as of a few months ago.↩︎

  6. Which means that there are no more discussions about source layout. You can submit source to ucm formatted using arcane Tarot rules, but when it comnes back out it will look like every other piece of Unison code.↩︎

  7. #r4ohr76lvt versus the original #aut6jgfc1j↩︎

Clicky