& vs. ref in Rust patterns

Posted on Thu 02 June 2016 in Code • Tagged with Rust, pattern matching, borrowing, referencesLeave a comment

Rust is one of those nice languages with pattern matching. If you don’t know, it can be thought of as a generalization of the switch statement: comparing objects not just by value (or overloaded equality operator, etc.) but by structure:

match hashmap.get(&key) {
    Some(value) => do_something_with(value),
    None => { panic!("Oh noes!"); },
}

It doesn’t end here. As you can see above, objects can also be destructured during the match (Some(value)), their parts assigned to bindings (value), and those bindings can subsequently be used in the match branch.

Neat? Definitely. In Rust, pattern matching is bread-and-butter of not only the match statement, but also for, (if)let, and even ordinary function arguments.

Mixing in Rust semantics

For a long time, however, I was somewhat confused as to what happens when references and borrowing is involved in matching. The two “operators” that often occur there are & (ampersand) and ref.

You should readily recognize the first one, as it is used pervasively in Rust to create references (and reference types). The second one quite obviously hints towards references as well. Yet those two constructs serve very different purposes when used within a pattern.

To add to the confusion, they are quite often encountered together:

use hyper::Url;

// print query string params of some URL
let url = Url::parse(some_url).unwrap();
let query_params: Vec<(String, String)> = url.query_pairs().unwrap_or(vec![]);
for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Lack of one or the other will be (at best) pointed out to you by the compiler, along with a suggestion where to add it. But addressing problems in this manner can only go so far. So how about we delve deeper and see what it’s really about?

Part of the reference, part of the pattern

Rust is very flexible as to what value can be a subject of pattern matching. You would be very hard pressed to find anything that cannot be used within a match statement, really. Both actual objects and references to objects are perfectly allowed:

struct Foo(i32);
// ...
let foo = &Foo(42);
match foo {
    x => println!("Matched!"),
}

In the latter case, however, we aren’t typically interested in the reference itself (like above). Instead, we want to determine some facts about the object it points to:

match foo {
    &Foo(num) => println!("Matched with number {}", num),
}

As you can see, this is where the ampersand comes into play. Just like a type constructor (Some, Ok, or Foo above), the & operator informs the Rust compiler what kind of value we’re expecting from the match. When it sees the ampersand, it knows we’re looking for references to certain objects, and not for the objects themselves.

Why is the distinction between an object and its reference important, though? In many other places, Rust is perfectly happy to blur the gap between references and actual objects1 — for example when calling most of their methods.

Pattern matching, however, due to its ability to unpack values into their constituent parts, is a destructive operation. Anything we apply match (or similar construct) to will be moved into the block by default:

let maybe_name = Some(String::from("Alice"));
// ...
match maybe_name {
    Some(n) => println!("Hello, {}", n),
    _ => {},
}
do_something_with(maybe_name)

Following the typical ownership semantics, this will prevent any subsequent moves and essentially consume the value:

error: use of partially moved value: `maybe_name` [E0382]
    do_something_with(maybe_name);
                      ^~~~~~~~~~

So just like the aforementioned type constructors (Some, etc.), the ampersand operator is simply part of the pattern that we match against. And just like with Some and friends, there is an obvious symmetry here: if & was used to create the value, it needs to be used when unpacking it.

The syntax used in a pattern that destructures an object is analogous to one used by the expression which created it.

Preventing the move

Errors like the one above often contain helpful notes:

note: `(maybe_name:core::option::Option::Some).0` moved here because it has type `collections::string::String`, which is moved by default
         Some(n) => println!("Hello, {}", n),
              ^

as well as hints for resolving them:

help: if you would like to borrow the value instead, use a `ref` binding as shown:
        Some(ref n) => println!("Hello, {}", n),

Here’s where ref enters the scene.

The message tells us that if we add a ref keyword in the suggested spot, we will switch from moving to borrowing for the match binding that follows (here, n). It will still capture its value exactly as before, but it will no longer assume ownership of it.

This is the crucial difference.

Unlike the ampersand, ref is not something we match against. It doesn’t affect what values match the pattern it’s in, and what values don’t2.

The only thing it changes is how parts of the matched value are captured by the pattern’s bindings:

  • by default, without ref, they are moved into the match arms
  • with ref, they are borrowed instead and represented as references

Looking at our example, the n binding in Some(n) is of type String: the actual field type from the matched structure. By contrast, the other n in Some(ref n) is a &String — that is, a reference to the field.

One is a move, the other one is a borrow.

ref annotates pattern bindings to make them borrow rather than move. It is not a part of the pattern as far as matching is concerned.

Used together

To finish off, let’s untangle the confusing example from the beginning of this post:

for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Since we know ref doesn’t affect whether or not the pattern matches, we could just as well have something like &(a, b). And this should be quite a bit easier to read: it clearly denotes we expect a reference to a 2-tuple of simple objects. Not coincidentally, such tuples are items from the vector we’re iterating over.

Problem is, without the refs we will attempt to move those items into the loop scope. But due to the way the vector is iterated over (&query_params), we’re only borrowing each item, so this is actually impossible. In fact, it would be a classic attempt to move out of a borrowed context.

It is also wholly unnecessary. The only thing this loop does is printing the items out, so accessing them through references is perfectly fine.

And this is exactly what the ref operator gives us. Adding the keyword back, we will switch from moving the values to just borrowing them instead.

To sum up

  • & denotes that your pattern expects a reference to an object. Hence & is a part of said pattern: &Foo matches different objects than Foo does.

  • ref indicates that you want a reference to an unpacked value. It is not matched against: Foo(ref foo) matches the same objects as Foo(foo).


  1. The technical term for this is a Deref coercion

  2. We can say that it doesn’t affect the satisfiability (or conversely, refutability) of the pattern. 

Continue reading

Moving out of a container in Rust

Posted on Fri 05 February 2016 in Code • Tagged with Rust, vector, borrow checker, referencesLeave a comment

To prevent the kind of memory errors that plagues many C programs, the borrow checker in Rust tracks how data is moved between variables, or accessed via references. This is all done at compile time, with zero runtime overhead, and is a sizeable part of Rust’s value offering.

Like all rigid and automated systems, however, it is necessarily constrained and cannot handle all situations perfectly. One of its limitations is treating all objects as atomic. It’s impossible for a variable to own a part of some bigger structure, neither is it possible to maintain mutable references to two or more elements of a collection.

If we nonetheless try:

fn get_name() -> String {
    let names = vec!["John".to_owned(), "Smith".to_owned()];
    join(names[0], names[1])
}

fn join(a: String, b: String) -> String {
    a + " " + &b
}

we’ll be served with a classic borrow checker error:

<anon>:3:25: 3:33 error: cannot move out of indexed content [E0507]
<anon>:3     let fullname = join(names[0], names[1]);
                                 ^~~~~~~~

Behind its rather cryptic verbiage, it informs us that we tried to move a part of the names vector — its first element — to a new variable (here, a function parameter). This isn’t allowed, because in principle it would render the vector invalid from the standpoint of strict memory safety. Rust would no longer guarantee names[0] to be a legal String: its internal pointer could’ve been invalidated by the code which the element moved to (the join function)1.

But while commendable, this guarantee isn’t exactly useful here. Even though names[0] would technically be invalid, there isn’t anyone to actually notice this fact. The names vector is inaccessible outside of the function it’s defined in, and even the function itself doesn’t look at it after the move. In its present form, the program is inarguably correct2 could’ve been accepted if partial moves from Vec were allowed by the borrow checker.

Pointers to the rescue?

Vectors wouldn’t be very useful or efficient, though, if we could only obtain copies or clones of their elements. As this is an inherent limitation of Rust’s memory model, and applies to all compound types (structs, hashmaps, etc.), it’s been recognized and countermeasures are available.

However, the idiomatic practice is to actually leave the elements be and access them solely through references:

fn get_name() -> String {
    let names = vec!["John".to_owned(), "Smith".to_owned()];
    join(&names[0], &names[1])
}

fn join(a: &String, b: &String) -> String {
    a.clone() + " " + b
}

The obvious downside of this approach is that it requires an interface change to join: it now has to accept pointers instead of actual objects3. And since the result is a completely new String, we have to either bite the bullet and clone, or write a more awkward join_into(a: &mut String, b: &String) function.
In general, making an API switch from actual objects to references has an annoying tendency to percolate up the call stacks and abstraction layers.

Vector solution

If we still insist on moving the elements out, at least in case of vector we aren’t completely out of luck. The Vec type offers several specialized methods that can slice, dice, and splice the collection in various ways. Those include:

  • split_first (and split_first_mut) for cutting right after the first element
  • split_last (and split_last_mut) for a similar cut right before the last element
  • split_at (and split_at_mut), generalized versions of the above methods
  • split_off, a partially-in-place version of split_at_mut
  • drain for moving all elements from a specified range

Other types may offer different methods, depending on their particular data layout, though drain should be available on any data structure that can be iterated over.

Structural advantage

What about user-defined types, such as structs?

Fortunately, these are covered by the compiler itself. Since accessing struct fields is a fully compile-time operation, it is possible to track the ownership of each individual object that makes up the structure. Thus there are no obstacles to simply moving all the fields:

struct Person {
    first_name: String,
    last_name: String,
}

fn get_name() -> String {
    let p = Person{first_name: "John".to_owned(),
                   last_name: "Smith".to_owned()};
    join(p.first_name, p.last_name)
}

If all else fails…

This leaves us with some rare cases when the container’s interface doesn’t quite support the exact subset of elements we want to move out. If we don’t want to drain them all and inspect every item for potential preservation, it may be time to skirt around the more dangerous areas of the language.

But I don’t necessarily mean going all out with unsafe blocks, pointers, and (let’s be honest) segfaults. Instead, we can look at the gray zone between them and the regular, borrow-checked Rust code.

Some of the functions inside the std::mem module can be said to fall into this category. Most notably, mem::swap and mem::replace allow us to operate directly on the memory blocks that back every Rust object, albeit without the dangerous ability to freely modify them.

What those functions enable is a small sleight of hand — a quick exchange of two variables or objects while the borrow checker “isn’t looking”. Possessing such an ability, we can smuggle any item out of a container as long as we’re able to provide a suitable replacement:

use std::mem;

/// Pick only the items under indices that are powers of two.
fn pick_powers_of_2<T: Default>(mut v: Vec<T>) -> Vec<T> {
    let mut result: Vec<T> = Vec::new();
    let mut i = 1;
    while i < v.len() {
        let elem = mem::replace(&mut v[i], T::default());
        result.push(elem);
        i *= 2;
    }
    result
}

Swap!
Pictured: implementation of mem::replace.

The Default value, if available, is usually a great choice here. Alternately, a Copy or Clone of some other element can also work if it’s cheap to obtain.


  1. In Rust jargon, it is sometimes said that the object has been “consumed” there. 

  2. As /u/Gankro points out on /r/rust, since Vec isn’t a part of the language itself, it doesn’t get to bend the borrow checking rules. Therefore speaking of counterfactual correctness is a bit too far-fetched in this case. 

  3. For Strings specifically, the usual practice is to require a more generic &str type (string slice) instead of &String

Continue reading