Karol Kuczmarski's Blog

The “let” type trick in Rust

Posted on Wed 01 February 2017 in Code • Tagged with Rust, types, pattern matching • Leave a comment

Here’s a neat little trick that’s especially useful if you’re just starting out with Rust.

Because the language uses type inference all over the place (or at least within a single function), it can often be difficult to figure out the type of an expression by yourself. Such knowledge is very handy in resolving compiler errors, which may be rather complex when generics and traits are involved.

The formula itself is very simple. Its shortest, most common version — and arguably the cleverest one, too — is the following let binding:

let () = some_expression;

In virtually all cases, this binding will cause a type error on its own, so it’s not something you’d leave permanently in your regular code.

But the important part here is the exact error message you get:

error[E0308]: mismatched types
  --> <anon>:42:13
   |
42 |         let () = some_expression;
   |             ^^ expected f64, found ()
   |
   = note: expected type `f64`
   = note:    found type `()`

The type expected by Rust here (in this example, f64) is also the type of some_expression. No more, no less.

There is nothing particularly wrong with using this technique and not caring too much how it works under the hood. But if you do want to know a little more what exactly is going on here, the rest of this post covers it in some detail.

The unit

Firstly, you may be wondering about this curious () type that the compiler has apparently found in the statement above. The official name for it is the unit type, and it has several notable characteristics:

There exists only one value¹ of this type: () (same symbol as the type itself).
It represents an empty tuple and has therefore the size of zero.
It is the type of any expression that’s turned into a statement.

That last fact is particularly interesting, as it makes () appear in error messages that are more indicative of syntactic mishaps rather than mismatched types:

fn positive_signum(x: i32) -> i32 {
    if x > 0 { 1i32 }
    0i32
}

error[E0308]: mismatched types
 --> <anon>:2:17
  |
2 |     if x > 0 { 1i32 }
  |                ^^^^ expected (), found i32
  |
  = note: expected type `()`
  = note:    found type `i32`

If you think about it, however, it makes perfect sense. The last expression inside a function body is the return value. This also means that everything before it has to be a statement: an expression of type ().

Working its way backward, Rust will therefore expect only such expressions before the final 0i32. This, in turn, puts the same constraint on the body of the if statement. The expression 1i32 (with its type of i32) clearly violates it, causing the above error².

“Expanded” version

A natural question now arises: is () inside of the let () = ... formula a type () or a value ()?…

To answer that, it’s quite helpful to compare and contrast the original binding with its longer “equivalent”:

let _: () = some_expression;

This statement is conceptually very similar to our original one. The error message it causes can also be used to debug issues with type inference.

Despite some cryptic symbols, the syntax here should also be more familiar. It occurs in many typical, ordinary bindings you can see in everyday Rust code. Here’s an example:

let x: i32 = 42;

where it’s abundantly clear that i32 is the type of variable x.

Analogously above, you can see that an unnamed symbol (_, the underscore) is declared to be of type ().

So in this alternate phrasing, () denotes a type.

Let a pattern emerge

What about the original form, let () = ...? There is no explicit type declaration here (i.e. no colon), and a pair of empty parentheses isn’t a name that could be assigned a new value.

What exactly is happening there, then?…

Well, it isn’t really anything special. While it may look exceptional, and totally unlike common usages of let, it is in fact exactly the same thing as a mundane let x = 5. The potential misconception here is about the exact meaning of x.

The simple version is that it’s a name for the bound expression.
But the actual truth is that it’s a pattern which is matched against that expression.

The terms “pattern” and “matching” here refer to the same mechanism that occurrs within the match statement. You could even imagine a peculiar form of desugaring, where a let statement is converted into a semantically equivalent match:

fn original() -> i32 {
    let x = 5;
    let y = 6;
    x + y
}

fn desugared() -> i32 {
    match 5 {
        x => match 6 {
            y => x + y
        }
    }
}

This analogy works perfectly³, because the patterns here are irrefutable: any value can match them, as all we’re doing is giving the value a name. Should the case be any different, Rust would reject our let statement — just like it rejects a match block that doesn’t include branches for all possible outcomes.

An empty pattern

But just because a pattern has to always match the expression, it doesn’t mean only simple identifiers like x or y are permitted in let. If Rust is able to statically ensure a match, it is perfectly OK to use a pattern with an internal structure⁴:

use std::num::Wrapping;
let Wrapping(x) = Wrapping(42);

Of course, something like this is just superfluous and silly. Same mechanism, however, is also behind the ability to “initialize multiple variables”:

let (x, y) = (0, 1);

What really happens is that we take a tuple expression (0, 1) and match it against a pattern (x, y). Because it is trivially satisified, we have the symbols x and y bound to the tuple elements. For all intents and purposes, this is equivalent to having two separate let statements:

let x = 0;
let y = 1;

Of course, a 2-tuple is not the only pattern of this kind we can use in let. Others possible patterns include, for example, the 0-tuple.

Or, as we express it in Rust, ():

let () = ();

Now that’s a truly useless statement! But it also harkens straight to our debug binding. It should be pretty clear now how it works:

The () stanza on the left is neither a type nor a name, but a pattern.
The expression on the right is being matched against this pattern.
Because the types of both of those things differ, the compiler signals an appropriate error.

The curious thing is that there is nothing inherently magical about using () on the left hand side. It’s simply the shortest pattern we can put after let. It’s also one that’s extremely unlikely to actually match the right hand side, which ensures we get the desired error. But if you substituted something equally exotic and rare — say, (x, ((y, z), Wrapping(w))) — it would work equally well as a rudimentary type detector.

Except for one thing, of course: nobody wants to type this much! Borne out of this frugality (and/or laziness), a custom thus emerged to use ().

Short, sweet, and clever.

A more formal, type-theoretic formulation of this fact is saying that () is inhabited by only one value. ↩
In case you are wondering, one possible fix here is to return 1i32; inside the if. An (arguably more idiomatic) alternative is to put 0i32 in an else branch, turning the entire if construct into the last — and only — expression in the function body. ↩
Note how each nested match is also introducing a new scope, exactly like the canonical desugaring of let which is often used to explain lifetimes and borrowing. ↩
Unfortunately, Rust isn’t currently capable of proving that the pattern is irrefutable in all obvious cases. For example, let Some(x) = Some(42); will be rejected due to the existence of a None variant in Option, even though it isn’t actually used in the (constant) expression on the right. ↩

& vs. ref in Rust patterns

Posted on Thu 02 June 2016 in Code • Tagged with Rust, pattern matching, borrowing, references • Leave a comment

Rust is one of those nice languages with pattern matching. If you don’t know, it can be thought of as a generalization of the switch statement: comparing objects not just by value (or overloaded equality operator, etc.) but by structure:

match hashmap.get(&key) {
    Some(value) => do_something_with(value),
    None => { panic!("Oh noes!"); },
}

It doesn’t end here. As you can see above, objects can also be destructured during the match (Some(value)), their parts assigned to bindings (value), and those bindings can subsequently be used in the match branch.

Neat? Definitely. In Rust, pattern matching is bread-and-butter of not only the match statement, but also for, (if)let, and even ordinary function arguments.

Mixing in Rust semantics

For a long time, however, I was somewhat confused as to what happens when references and borrowing is involved in matching. The two “operators” that often occur there are & (ampersand) and ref.

You should readily recognize the first one, as it is used pervasively in Rust to create references (and reference types). The second one quite obviously hints towards references as well. Yet those two constructs serve very different purposes when used within a pattern.

To add to the confusion, they are quite often encountered together:

use hyper::Url;

// print query string params of some URL
let url = Url::parse(some_url).unwrap();
let query_params: Vec<(String, String)> = url.query_pairs().unwrap_or(vec![]);
for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Lack of one or the other will be (at best) pointed out to you by the compiler, along with a suggestion where to add it. But addressing problems in this manner can only go so far. So how about we delve deeper and see what it’s really about?

Part of the reference, part of the pattern

Rust is very flexible as to what value can be a subject of pattern matching. You would be very hard pressed to find anything that cannot be used within a match statement, really. Both actual objects and references to objects are perfectly allowed:

struct Foo(i32);
// ...
let foo = &Foo(42);
match foo {
    x => println!("Matched!"),
}

In the latter case, however, we aren’t typically interested in the reference itself (like above). Instead, we want to determine some facts about the object it points to:

match foo {
    &Foo(num) => println!("Matched with number {}", num),
}

As you can see, this is where the ampersand comes into play. Just like a type constructor (Some, Ok, or Foo above), the & operator informs the Rust compiler what kind of value we’re expecting from the match. When it sees the ampersand, it knows we’re looking for references to certain objects, and not for the objects themselves.

Why is the distinction between an object and its reference important, though? In many other places, Rust is perfectly happy to blur the gap between references and actual objects¹ — for example when calling most of their methods.

Pattern matching, however, due to its ability to unpack values into their constituent parts, is a destructive operation. Anything we apply match (or similar construct) to will be moved into the block by default:

let maybe_name = Some(String::from("Alice"));
// ...
match maybe_name {
    Some(n) => println!("Hello, {}", n),
    _ => {},
}
do_something_with(maybe_name)

Following the typical ownership semantics, this will prevent any subsequent moves and essentially consume the value:

error: use of partially moved value: `maybe_name` [E0382]
    do_something_with(maybe_name);
                      ^~~~~~~~~~

So just like the aforementioned type constructors (Some, etc.), the ampersand operator is simply part of the pattern that we match against. And just like with Some and friends, there is an obvious symmetry here: if & was used to create the value, it needs to be used when unpacking it.

The syntax used in a pattern that destructures an object is analogous to one used by the expression which created it.

Preventing the move

Errors like the one above often contain helpful notes:

note: `(maybe_name:core::option::Option::Some).0` moved here because it has type `collections::string::String`, which is moved by default
         Some(n) => println!("Hello, {}", n),
              ^

as well as hints for resolving them:

help: if you would like to borrow the value instead, use a `ref` binding as shown:
        Some(ref n) => println!("Hello, {}", n),

Here’s where ref enters the scene.

The message tells us that if we add a ref keyword in the suggested spot, we will switch from moving to borrowing for the match binding that follows (here, n). It will still capture its value exactly as before, but it will no longer assume ownership of it.

This is the crucial difference.

Unlike the ampersand, ref is not something we match against. It doesn’t affect what values match the pattern it’s in, and what values don’t².

The only thing it changes is how parts of the matched value are captured by the pattern’s bindings:

by default, without ref, they are moved into the match arms
with ref, they are borrowed instead and represented as references

Looking at our example, the n binding in Some(n) is of type String: the actual field type from the matched structure. By contrast, the other n in Some(ref n) is a &String — that is, a reference to the field.

One is a move, the other one is a borrow.

ref annotates pattern bindings to make them borrow rather than move. It is not a part of the pattern as far as matching is concerned.

Used together

To finish off, let’s untangle the confusing example from the beginning of this post:

for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Since we know ref doesn’t affect whether or not the pattern matches, we could just as well have something like &(a, b). And this should be quite a bit easier to read: it clearly denotes we expect a reference to a 2-tuple of simple objects. Not coincidentally, such tuples are items from the vector we’re iterating over.

Problem is, without the refs we will attempt to move those items into the loop scope. But due to the way the vector is iterated over (&query_params), we’re only borrowing each item, so this is actually impossible. In fact, it would be a classic attempt to move out of a borrowed context.

It is also wholly unnecessary. The only thing this loop does is printing the items out, so accessing them through references is perfectly fine.

And this is exactly what the ref operator gives us. Adding the keyword back, we will switch from moving the values to just borrowing them instead.

To sum up

& denotes that your pattern expects a reference to an object. Hence & is a part of said pattern: &Foo matches different objects than Foo does.
ref indicates that you want a reference to an unpacked value. It is not matched against: Foo(ref foo) matches the same objects as Foo(foo).

The technical term for this is a Deref coercion. ↩
We can say that it doesn’t affect the satisfiability (or conversely, refutability) of the pattern. ↩