Karol Kuczmarski's Blog

The “let” type trick in Rust

Posted on Wed 01 February 2017 in Code • Tagged with Rust, types, pattern matching • Leave a comment

Here’s a neat little trick that’s especially useful if you’re just starting out with Rust.

Because the language uses type inference all over the place (or at least within a single function), it can often be difficult to figure out the type of an expression by yourself. Such knowledge is very handy in resolving compiler errors, which may be rather complex when generics and traits are involved.

The formula itself is very simple. Its shortest, most common version — and arguably the cleverest one, too — is the following let binding:

let () = some_expression;

In virtually all cases, this binding will cause a type error on its own, so it’s not something you’d leave permanently in your regular code.

But the important part here is the exact error message you get:

error[E0308]: mismatched types
  --> <anon>:42:13
   |
42 |         let () = some_expression;
   |             ^^ expected f64, found ()
   |
   = note: expected type `f64`
   = note:    found type `()`

The type expected by Rust here (in this example, f64) is also the type of some_expression. No more, no less.

There is nothing particularly wrong with using this technique and not caring too much how it works under the hood. But if you do want to know a little more what exactly is going on here, the rest of this post covers it in some detail.

The unit

Firstly, you may be wondering about this curious () type that the compiler has apparently found in the statement above. The official name for it is the unit type, and it has several notable characteristics:

There exists only one value¹ of this type: () (same symbol as the type itself).
It represents an empty tuple and has therefore the size of zero.
It is the type of any expression that’s turned into a statement.

That last fact is particularly interesting, as it makes () appear in error messages that are more indicative of syntactic mishaps rather than mismatched types:

fn positive_signum(x: i32) -> i32 {
    if x > 0 { 1i32 }
    0i32
}

error[E0308]: mismatched types
 --> <anon>:2:17
  |
2 |     if x > 0 { 1i32 }
  |                ^^^^ expected (), found i32
  |
  = note: expected type `()`
  = note:    found type `i32`

If you think about it, however, it makes perfect sense. The last expression inside a function body is the return value. This also means that everything before it has to be a statement: an expression of type ().

Working its way backward, Rust will therefore expect only such expressions before the final 0i32. This, in turn, puts the same constraint on the body of the if statement. The expression 1i32 (with its type of i32) clearly violates it, causing the above error².

“Expanded” version

A natural question now arises: is () inside of the let () = ... formula a type () or a value ()?…

To answer that, it’s quite helpful to compare and contrast the original binding with its longer “equivalent”:

let _: () = some_expression;

This statement is conceptually very similar to our original one. The error message it causes can also be used to debug issues with type inference.

Despite some cryptic symbols, the syntax here should also be more familiar. It occurs in many typical, ordinary bindings you can see in everyday Rust code. Here’s an example:

let x: i32 = 42;

where it’s abundantly clear that i32 is the type of variable x.

Analogously above, you can see that an unnamed symbol (_, the underscore) is declared to be of type ().

So in this alternate phrasing, () denotes a type.

Let a pattern emerge

What about the original form, let () = ...? There is no explicit type declaration here (i.e. no colon), and a pair of empty parentheses isn’t a name that could be assigned a new value.

What exactly is happening there, then?…

Well, it isn’t really anything special. While it may look exceptional, and totally unlike common usages of let, it is in fact exactly the same thing as a mundane let x = 5. The potential misconception here is about the exact meaning of x.

The simple version is that it’s a name for the bound expression.
But the actual truth is that it’s a pattern which is matched against that expression.

The terms “pattern” and “matching” here refer to the same mechanism that occurrs within the match statement. You could even imagine a peculiar form of desugaring, where a let statement is converted into a semantically equivalent match:

fn original() -> i32 {
    let x = 5;
    let y = 6;
    x + y
}

fn desugared() -> i32 {
    match 5 {
        x => match 6 {
            y => x + y
        }
    }
}

This analogy works perfectly³, because the patterns here are irrefutable: any value can match them, as all we’re doing is giving the value a name. Should the case be any different, Rust would reject our let statement — just like it rejects a match block that doesn’t include branches for all possible outcomes.

An empty pattern

But just because a pattern has to always match the expression, it doesn’t mean only simple identifiers like x or y are permitted in let. If Rust is able to statically ensure a match, it is perfectly OK to use a pattern with an internal structure⁴:

use std::num::Wrapping;
let Wrapping(x) = Wrapping(42);

Of course, something like this is just superfluous and silly. Same mechanism, however, is also behind the ability to “initialize multiple variables”:

let (x, y) = (0, 1);

What really happens is that we take a tuple expression (0, 1) and match it against a pattern (x, y). Because it is trivially satisified, we have the symbols x and y bound to the tuple elements. For all intents and purposes, this is equivalent to having two separate let statements:

let x = 0;
let y = 1;

Of course, a 2-tuple is not the only pattern of this kind we can use in let. Others possible patterns include, for example, the 0-tuple.

Or, as we express it in Rust, ():

let () = ();

Now that’s a truly useless statement! But it also harkens straight to our debug binding. It should be pretty clear now how it works:

The () stanza on the left is neither a type nor a name, but a pattern.
The expression on the right is being matched against this pattern.
Because the types of both of those things differ, the compiler signals an appropriate error.

The curious thing is that there is nothing inherently magical about using () on the left hand side. It’s simply the shortest pattern we can put after let. It’s also one that’s extremely unlikely to actually match the right hand side, which ensures we get the desired error. But if you substituted something equally exotic and rare — say, (x, ((y, z), Wrapping(w))) — it would work equally well as a rudimentary type detector.

Except for one thing, of course: nobody wants to type this much! Borne out of this frugality (and/or laziness), a custom thus emerged to use ().

Short, sweet, and clever.

A more formal, type-theoretic formulation of this fact is saying that () is inhabited by only one value. ↩
In case you are wondering, one possible fix here is to return 1i32; inside the if. An (arguably more idiomatic) alternative is to put 0i32 in an else branch, turning the entire if construct into the last — and only — expression in the function body. ↩
Note how each nested match is also introducing a new scope, exactly like the canonical desugaring of let which is often used to explain lifetimes and borrowing. ↩
Unfortunately, Rust isn’t currently capable of proving that the pattern is irrefutable in all obvious cases. For example, let Some(x) = Some(42); will be rejected due to the existence of a None variant in Option, even though it isn’t actually used in the (constant) expression on the right. ↩

Rust: first impressions

Posted on Thu 10 December 2015 in Code • Tagged with Rust, pointers, types, FP, OOP, traits • Leave a comment

Having recently been writing some C++ code at work, I had once again experienced the kind of exasperation that this cumbersome language evokes on regular basis. When I was working in it less sporadically, I was shrugging it off and telling myself it’s all because of the low level it operates on. Superior performance was the other side of the deal, and it was supposed to make all the trade-offs worthwhile.

Now, however, I realized that running close to the metal by no means excuses the sort of clunkiness that C++ permits. For example, there really is no reason why the archaically asinine separation of header & source files — with its inevitable redundancy of declarations and definitions, worked around with Java-esque contraptions such as pimpl — is still the bread and butter of C++ programs.
Same goes for the lack of sane dependency management, or a universal, portable build system. None of those would be at odds with native compilation to machine code, or runtime speeds that are adequate for real-time programs.

Rather than dwelling on those gripes, I thought it’d be more productive to look around and see what’s the modern offerring in the domain of lower level, really fast languages. The search wasn’t long at all, because right now it seems there is just one viable contender: Rust¹.

Rusty systems

Rust introduces itself as a “systems programming language”, which is quite a bold claim. What followed the last time this phrase has been applied to an emerging language — Go — was a kind of word twisting that’s more indicative of politics, not computer science.

But Rust’s pretense to the system level is well justified. It clearly provides the requisite toolkit for working directly with the hardware, be it embedded controllers or fully featured computers. It offers compilation to native machine code; direct memory access; running time guarantees thanks to the lack of GC-incuded stops; and great interoperability through static and dynamic linkage.

In short, with Rust you can wreak havoc against the RAM and twiddle bits to your heart’s content.

Safe and sound

To be fair, though, the “havoc” part is not entirely accurate. Despite its focus on the low level, efficient computing, Rust aims to be a very safe language. Unlike C, it actively tries to prevent the programmer from shooting themselves in the foot — though it will hand you the gun if you but ask for it.

The safety guarantees provided by Rust apply to resource management, with the specific emphasis on memory and pointers to it. The way that most contemporary languages deal with memory is by introducing a garbage collector which mostly (though not wholly) relieves the programmer from thinking about allocations and deallocations. However, the kind of global, stop-the-world garbage collections (e.g. mark-and-sweep) is costly and unpredictable, ruling it out as a mechanism for real-time systems.

For this reason, Rust doesn’t mandate a GC of this kind². And although it offers mechanisms that are similar to smart pointers from C++ (e.g. std::shared_ptr), it is actually preferable and safer to use regular, “naked” pointers: &Foo versus Cell<Foo> or RefCell<Foo> (which are some of the Rust’s “smart pointer” types).

The trick is in the clever compiler. As long as we use regular pointers, it is capable of detecting potential memory bugs at compilation time. They are referred to as “data races” in Rust’s terminology, and include perennial problems that will segfault any C code which wasn’t written with utmost care.

Part of those safety guarantees is also the default immutability of references (pointers). The simplest reference of type &Foo in Rust translates to something like const Foo * const in C³. You have to explicitly request mutability with the mut keyword, and Rust ensures there is always at most one mutable reference to any value, thus preventing problems caused by pointer aliasing.

But what if you really must sling raw pointers, and access arbitrary memory locations? Maybe you are programming a microcontroller where I/O is done through a special memory region. For those occasions, Rust has got you covered with the unsafe keyword:

// Read the state of a diode in some imaginary uC.
fn get_led_state(i: isize) -> bool {
    assert!(i >= 0 && i <= 4, "There are FOUR lights!");
    let p: *const u8 = 0x1234 as *const u8;  // known memory location
    unsafe { *p .offset(i) != 0 }
}

Its usage, like in the above example, can be very localized, limited only to those places where it’s truly necessary and guarded by the appropriate checks. As a result, the interface exposed by the above function can be considered safe. The unrestricted memory access can be contained to where it’s really inevitable.

Typing counts

Ensuring memory safety is not the only way in which Rust differentiates itself from C. What separates those two languages is also a few decades of practice and research into programming semantics. It’s only natural to expect Rust to take advantage of this progress.

And advantage it takes. Although Rust’s type system isn’t nearly as advanced and complex like — say — Scala’s, it exhibits several interesting properties that are indicative of its relatively modern origin.

First, it mixes the two most popular programming paradigms — functional and object-oriented — in roughly equal concentrations, as opposed to being biased towards the latter. Rust doesn’t have interfaces or classes: it has traits and their implementations. Even though they often fulfill similar purposes of abstraction and encapsulation, these constructs are closer to the concepts of type classes and their instances, which are found for example in Haskell.

Still, the more familiar notions of OOP aren’t too far off. Most of the key functionality of classes, for example, can be simulated by implementing “default” traits for user-defined types:

struct Person {
    first_name: String,
    last_name: String,
}

impl Person {
    fn new(first_name: &str, last_name: &str) -> Person {
        Person {
            first_name: first_name.to_string(),
            last_name: last_name.to_string(),
        }
    }

    fn greet(&self) {
        println!("Hello, {}!", self.first_name);
    }
}

// usage
let p = Person::new("John", "Doe");
p.greet();

The second aspect of Rust’s type system that we would come to expect from a new language is its expressive power. Type inference is nowadays a staple, and above we can observe the simplest form of it. But it extends further, to generic parameters, closure arguments, and closure return values.

Generics, by the way, are quite nice as well. Besides their applicability to structs, type aliases, functions, traits, trait implementations, etc., they allow for constraining their arguments with traits. This is similar to the abandoned-and-not-quite-revived-yet idea of concepts in C++, or to an analogous mechanism from C#.

The third common trend in contemporary language design is the use of type system to solve common tasks. Rust doesn’t go full Haskell and opt for monads for everything, but its Option and Result types are evidently the functional approach to error handling⁴. To facilitate their use, a powerful pattern matching facility is also present in Rust.

Unexpectedly pythonic

If your general go-to language is Python, you will find Rust a very nice complement and possibly a valuable instrument in your coding arsenal. Interoperability between Python and Rust is stupidly easy, thanks to both the ctypes module and the extreme simplicity of creating portable, shared libraries in Rust. Offloading some expensive, GIL-bypassing computation to a fast, native code written in Rust can thus be a relatively painless way of speeding up crucial parts of a Python program.

But somewhat more surprisingly, Rust has quite a few bits that seem to be directly inspired by Python semantics. Granted, those two languages are conceptually pretty far apart in general, but the analogies are there:

The concept of iterators in Rust is very similar to iterables in Python. Even the for loop is basically identical: rather than manually increment a counter, both in Rust and Python you iterate over a range of numbers.
Oh, and both languages have an enumerate method/ function that yields pairs of (index, element).
Syntax for method definition in Rust uses the self keyword as first argument to distinguish between instance methods and “class”/”static” methods (or associated functions in Rust’s parlance). This is even more pythonic than in actual Python, where self is technically just a convention, albeit an extremely strong one.
In either language, overloading operators doesn’t use any new keywords or special syntax, like it does in C++, C#, and others. Python accomplishes it through __magic__ methods, whereas Rust has very similarly named operator traits.
Rust basically has doctest. If you don’t know, the doctest module is a standard Python testing utility that can run usage examples found in documentation comments and verify their correctness. Rust version (rustdoc) is even more powerful and flexible, allowing for example to mark additional boilerplate lines that should be run when testing examples, but not included in the generated documentation.

I’m sure the list doesn’t end here and will grow over time. As of this writing, for example, nightly builds of Rust already offer advanced slice pattern matching which are very similar to the extended iterable unpacking from Python 3.

Is it worth it?

Depending on your background and the programming domain you are working in, you may be wondering if Rust is a language that’s worth looking into now, or in the near future.

Firstly, let me emphasize that it’s still in its early stages. Although the stable version 1.0 has been released a good couple of months ago, the ecosystem isn’t nearly as diverse and abundant as in some of the other new languages.

If you are specifically looking to deploying Rust-written API servers, backends, and other — shall I use the word — microservices, then right now you’ll probably be better served by more established solutions, like Java with fibers, asynchronous Python on PyPy, Erlang, Go, node.js, or similar. I predict Rust catching up here in the coming months, though, because the prospect of writing native speed JSON slingers with relative ease is just too compelling to pass.

The other interesting area for Rust is game programming, because it’s one of the few languages capable of supporting even the most demanding AAA+ productions. The good news is that portable, open source game engines are already here. The bad news is that most of the existing knowledge about designing and coding high performance games is geared towards writing (stripped down) C++. The community is also rather ~~stubborn~~ reluctant to adopt anything that may carry even a hint of potentially unknown performance implications. Although some inroads have been made (here’s, for example, an entity component system written in Rust), and I wouldn’t be surprised to see indie games written in Rust, it probably won’t take over the industry anytime soon.

When it comes to hardware, though, Rust may already have the upper hand. It is obviously much easier language to program in than pure C. Along with its toolchain’s ability to produce minimal executables, it makes for a compelling language for programming microcontrollers and other embedded devices.

So in short, Rust is pretty nice. And if you have read that far, I think you should just go ahead and have a look for yourself :)

Because as much as we’d like for D to finally get somewhere, at this point we may have better luck waiting for the Year of Linux on Desktop to dawn… ↩
Of course, nobody has stopped the community from implementing it. ↩
Strictly speaking, it’s the binding such as let x = &foo; that translates to it. Unadorned C pointer type Foo* would correspond to mutable binding to a mutable reference in Rust, i.e. let mut x = &mut foo;. ↩
Their Haskell equivalents are Maybe and Either type classes, respectively. ↩