The “let” type trick in Rust
Posted on Wed 01 February 2017 in Code • Tagged with Rust, types, pattern matching • Leave a comment
Here’s a neat little trick that’s especially useful if you’re just starting out with Rust.
Because the language uses type inference all over the place (or at least within a single function), it can often be difficult to figure out the type of an expression by yourself. Such knowledge is very handy in resolving compiler errors, which may be rather complex when generics and traits are involved.
The formula itself is very simple.
Its shortest, most common version — and arguably the cleverest one, too —
is the following let
binding:
let () = some_expression;
In virtually all cases, this binding will cause a type error on its own, so it’s not something you’d leave permanently in your regular code.
But the important part here is the exact error message you get:
error[E0308]: mismatched types
--> <anon>:42:13
|
42 | let () = some_expression;
| ^^ expected f64, found ()
|
= note: expected type `f64`
= note: found type `()`
The type expected by Rust here (in this example, f64
)
is also the type of some_expression
. No more, no less.
There is nothing particularly wrong with using this technique and not caring too much how it works under the hood. But if you do want to know a little more what exactly is going on here, the rest of this post covers it in some detail.
The unit
Firstly, you may be wondering about this curious ()
type
that the compiler has apparently found in the statement above.
The official name for it is the unit type,
and it has several notable characteristics:
- There exists only one value1 of this type:
()
(same symbol as the type itself). - It represents an empty tuple and has therefore the size of zero.
- It is the type of any expression that’s turned into a statement.
That last fact is particularly interesting,
as it makes ()
appear in error messages that are more indicative of syntactic mishaps
rather than mismatched types:
fn positive_signum(x: i32) -> i32 {
if x > 0 { 1i32 }
0i32
}
error[E0308]: mismatched types
--> <anon>:2:17
|
2 | if x > 0 { 1i32 }
| ^^^^ expected (), found i32
|
= note: expected type `()`
= note: found type `i32`
If you think about it, however, it makes perfect sense.
The last expression inside a function body is the return value.
This also means that everything before it has to be a statement:
an expression of type ()
.
Working its way backward,
Rust will therefore expect only such expressions before the final 0i32
.
This, in turn, puts the same constraint on the body of the if
statement.
The expression 1i32
(with its type of i32
) clearly violates it,
causing the above error2.
“Expanded” version
A natural question now arises:
is ()
inside of the let () = ...
formula a type ()
or a value ()
?…
To answer that, it’s quite helpful to compare and contrast the original binding with its longer “equivalent”:
let _: () = some_expression;
This statement is conceptually very similar to our original one. The error message it causes can also be used to debug issues with type inference.
Despite some cryptic symbols, the syntax here should also be more familiar. It occurs in many typical, ordinary bindings you can see in everyday Rust code. Here’s an example:
let x: i32 = 42;
where it’s abundantly clear that i32
is the type of variable x
.
Analogously above, you can see that
an unnamed symbol (_
, the underscore) is declared to be of type ()
.
So in this alternate phrasing, ()
denotes a type.
Let a pattern emerge
What about the original form, let () = ...
?
There is no explicit type declaration here (i.e. no colon),
and a pair of empty parentheses isn’t a name that could be assigned a new value.
What exactly is happening there, then?…
Well, it isn’t really anything special.
While it may look exceptional, and totally unlike common usages of let
,
it is in fact exactly the same thing as a mundane let x = 5
.
The potential misconception here is about the exact meaning of x
.
The simple version is that it’s a name for the bound expression.
But the actual truth is that it’s a pattern which is matched against that expression.
The terms “pattern” and “matching” here refer to the same mechanism
that occurrs within the match
statement.
You could even imagine a peculiar form of desugaring,
where a let
statement is converted into a semantically equivalent match
:
fn original() -> i32 {
let x = 5;
let y = 6;
x + y
}
fn desugared() -> i32 {
match 5 {
x => match 6 {
y => x + y
}
}
}
This analogy works perfectly3, because the patterns here are irrefutable:
any value can match them, as all we’re doing is giving the value a name.
Should the case be any different, Rust would reject our let
statement —
just like it rejects a match
block that doesn’t include branches for all possible outcomes.
An empty pattern
But just because a pattern has to always match the expression,
it doesn’t mean only simple identifiers like x
or y
are permitted in let
.
If Rust is able to statically ensure a match,
it is perfectly OK to use a pattern with an internal structure4:
use std::num::Wrapping;
let Wrapping(x) = Wrapping(42);
Of course, something like this is just superfluous and silly. Same mechanism, however, is also behind the ability to “initialize multiple variables”:
let (x, y) = (0, 1);
What really happens is that we take a tuple expression (0, 1)
and match it against a pattern (x, y)
.
Because it is trivially satisified,
we have the symbols x
and y
bound to the tuple elements.
For all intents and purposes, this is equivalent to having two separate let
statements:
let x = 0;
let y = 1;
Of course, a 2-tuple is not the only pattern of this kind we can use in let
.
Others possible patterns include, for example, the 0-tuple.
Or, as we express it in Rust, ()
:
let () = ();
Now that’s a truly useless statement! But it also harkens straight to our debug binding. It should be pretty clear now how it works:
- The
()
stanza on the left is neither a type nor a name, but a pattern. - The expression on the right is being matched against this pattern.
- Because the types of both of those things differ, the compiler signals an appropriate error.
The curious thing is that there is nothing inherently magical about using ()
on the left hand side.
It’s simply the shortest pattern we can put after let
.
It’s also one that’s extremely unlikely to actually match the right hand side,
which ensures we get the desired error.
But if you substituted something equally exotic and rare — say, (x, ((y, z), Wrapping(w)))
—
it would work equally well as a rudimentary type detector.
Except for one thing, of course: nobody wants to type this much!
Borne out of this frugality (and/or laziness), a custom thus emerged to use ()
.
Short, sweet, and clever.
-
A more formal, type-theoretic formulation of this fact is saying that
()
is inhabited by only one value. ↩ -
In case you are wondering, one possible fix here is to
return 1i32;
inside theif
. An (arguably more idiomatic) alternative is to put0i32
in anelse
branch, turning the entireif
construct into the last — and only — expression in the function body. ↩ -
Note how each nested
match
is also introducing a new scope, exactly like the canonical desugaring oflet
which is often used to explain lifetimes and borrowing. ↩ -
Unfortunately, Rust isn’t currently capable of proving that the pattern is irrefutable in all obvious cases. For example,
let Some(x) = Some(42);
will be rejected due to the existence of aNone
variant inOption
, even though it isn’t actually used in the (constant) expression on the right. ↩