Simulating exceptions in Rust with IIFE

Posted on Sat 17 December 2016 in Code • Tagged with Rust, IIFE, error handling, exceptions, closures, lambdasLeave a comment

While many languages use exceptions for handling errors, Rust prefers a slightly different, yet very classical approach: return values.

Now, they aren’t exactly the same thing as in C, where the error is indicated by a special value within the same return type. In Rust, the Result enum can neatly separate the two, in similar vein to how ad-hoc tuples in Go do1. But unlike Go, Rust also offers additional facilities for error propagation, including the try! macro and the recently stabilized ? operator. And finally, the Result wrappings can be straightforwardly unpacked, possibly by defaulting to a known safe value.

Some conveniences of exceptions may be hard to pass up, though. The try-catch construct is evidently one of them, and Rust might eventually get it in one form or another. Before that happens, however, there is a trick that can often work as an acceptable substitute.

Many lets

Here’s an example where it can be very useful.

Have a look at the following function. Its purpose is to retrieve a GitHub login of a user who owns a specific gist — a small sample of code posted to the gists.github.com website2.

Let’s assume we have already talked to GitHub API and received the following JSON response from its relevant endpoint:

{
    "id": "12345678",
    "owner": {
        "login": "Octocat",
        ...
    }
    ...
}

Parsing it is easy: we can do it with the rustc_serialize crate, among other options. What proves a little more involved is to dig through the JSON tree in order to reach the interesting value:

use rustc_serialize::json::Json;


/// Retrieve the gist owner from a JSON received from
/// the /gists/$ID endpoint of the GitHub API.
///
/// If the gist is anonymous, "anonymous" is returned.
fn gist_owner_from_info(info: &Json) -> String {
    if let Some(info) = info.as_object() {
        if let Some(owner) = info.get("owner").and_then(|o| o.as_object()) {
            if let Some(result) = owner.get("login").and_then(|l| l.as_string()) {
                return result.to_owned();
            }
        }
    }
    "anonymous".into()
}

Whew! I guess we’re lucky we don’t need to go too deep into that JSON. The code is clearly exhibiting a rightward slant, which some people refer to as the “arrow code”, Unsurprisingly, it is generally considered bad for readability.

There are few other ways of writing this, of course, including a style reminiscent of JavaScript promises — that is, relying completely on the and_then method. Neither seem very satisfying, though, especially if you compare it with something like this:

try:
    return str(info["owner"]["login"])
except (KeyError, TypeError):
    return "anonymous"

Yes, exceptions are quite useful sometimes.

So, how can we get something like this in Rust?

JavaScript for the rescue

Succor comes from an unexpected direction. To emulate exceptions — specifically, the try-catch exception blocks — we can utilize a technique that is most popular in… JavaScript.

At least until recently, JavaScript did not have a block local scope. Since every variable declaration within a function is hoisted to the top of that function, it essentially makes function scope the only usable one (besides global, of course).

As a result, a variety of JavaScript idioms rely on introducing “superfluous” functions, solely for the purpose of creating a nested scope. Many times, those functions are neither named nor stored in any variable; rather, they are immediately invoked.

This is what is commonly understood as Immediately Invoked Function Expression, or IIFE for short.

An oft-cited example involves an IIFE which itself returns another function:

for (var i = 0; i < 10; ++i) {
    var $para = $("p#" + i);  // <p id="0">, <p id="1">, etc.
    var clickHandler = (function(i) {  // IIFE!
        return function() {
            alert("Clicked element no. " + (i + 1));
        };
    })(i);
    $para.on('click', clickHandler);
}

The function expression is necessary here, because it allows to control what exactly goes into the closure of the inner function. If the clickHandlers were assigned the function() { alert(...) } expression directly, they would all close over the same loop counter variable. All would then display the exact same message.

We don’t need to employ those workarounds in Rust. Thanks to local scoping, a simple pair of { braces } would work exactly the same. You can imagine a direct rewrite of the above example, though, where an anonymous closure is used to similar effect:

// WARNING: Not idiomatic! (Also not a real DOM library).

for i in (0..10) {
    let para = dom.find_element_by_id("p", i.to_string()).unwrap();
    let click_handler = |i| {
        move |_: Event| { dom.exec_js(&format!(
            "alert('Clicked element no. #{}');", i + 1)); }
    }(i);
    para.add_event_listener(Event::Click, click_handler)
}

In other words, Rust supports IIFEs just fine.

Just put a function on it

Okay, this is quite amusing and probably pretty neat. But does it help us with the error handling story exactly?…

Let’s take another stab at rewriting the gist_owner_from_info routine. This time, we’ll extract the meaty part into a separate function. We will also take advantage of one trivial, but very useful try_opt crate which is essentially an equivalent of the try! macro for Options:

#[macro_use] extern crate try_opt;

fn gist_owner_from_info(info: &Json) -> String {
    gist_owner_from_info_internal(info).unwrap_or("anonymous".into())
}

fn gist_owner_from_info_internal(info: &Json) -> Option<String> {
    let info = try_opt!(info.as_object());
    let owner = try_opt!(info.get("owner").and_then(|o| o.as_object()));
    let login = try_opt!(owner.get("login").and_then(|l| l.as_string()));
    Some(login.to_owned())
}

Now this should be a little easier on the eyes. (And if you want, you can eschew and_then completely in favor of more try_opt!).

The downside is that we now have this _internal function that’s awkwardly sticking out. We could pull it in, and turn it into an inner function, but why stop half-way? Let’s just make it an IIFE already:

fn gist_owner_from_info(info: &Json) -> String {
    || -> Option<String> {
        let info = try_opt!(info.as_object());
        let owner = try_opt!(info.get("owner").and_then(|o| o.as_object()));
        let login = try_opt!(owner.get("login").and_then(|l| l.as_string()));
        Some(login.to_owned())
    }().unwrap_or("anonymous".into())
}

Not bad, eh? The analogies with exception handling should be pretty evident, too3:

  • The closure itself works as a try block, with closure’s body containing the “guarded” code.
  • The unwrap family of methods (especially unwrap_or_else) dubs for a catch/except section.

Sure, we do need try! (or try_opt!) macros to mark instructions that may “throw an exception”, but with the ?-based syntax it shouldn’t be too big of a deal. And when the time comes, this code will be very easy to port to a trait-based exception handling solution that’s currently in the works.

Oh, and the best part? Both Rust and the underlying LLVM are very adept at inlining closures, so everything here should compile to optimal code.

Bonus: a lifetime conundrum

Well, almost optimal. There is one more thing left to do before we can call this a truly zero-cost abstraction.

We need to stop allocating so damn much!

It should be pretty obvious that the function doesn’t need to create a brand new String every time it’s called. The text is in the input Json, and we take that Json by reference already. It’s only fair we stop creating Strings and simply return a &str reference instead.

In fact, this should be as easy as removing the to_owned/into calls, right?

fn gist_owner_from_info(info: &Json) -> &str {
    || -> Option<&str> {
        let info = try_opt!(info.as_object());
        let owner = try_opt!(info.get("owner").and_then(|o| o.as_object()));
        owner.get("login").and_then(|l| l.as_string()))
    }().unwrap_or("anonymous")
}

Wrong, apparently. If you present this code to the compiler, it will serve you quite a mouthful of an error, including helpful tidbits in the vein of “expected A, found A”:

error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
   --> src/github.rs:3:34
    |
  3 |         let info = try_opt!(info.as_object());
    |                                  ^^^^^^^^^
    |
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the block at 1:45...
   --> src/github.rs:1:46
    |
  1 | fn gist_owner_from_info(info: &Json) -> &str {
    |                                              ^
note: ...so that reference does not outlive borrowed content
   --> src/github.rs:3:29
    |
  3 |         let info = try_opt!(info.as_object());
    |                             ^^^^
note: but, the lifetime must be valid for the anonymous lifetime #1 defined on the block at 2:23...
   --> src/github.rs:2:24
    |
  2 |     || -> Option<&str> {
    |                        ^
note: ...so that expression is assignable (expected std::option::Option<&str>, found std::option::Option<&str>)
   --> src/github.rs:5:9
    |
  5 |         owner.get("login").and_then(|l| l.as_string())
    |

The crux of this verbiage is that the Rust compiler is unable to reconcile the lifetime of the closure’s return value, the input, and final result of the function.

It shouldn’t really be trying very hard, though, for the lifetime is obvious. It’s the same as the one implicitly attached to the input &Json. Seems like in this case, we need to be a little more helpful and label it explicitly:

fn gist_owner_from_info<'i>(info: &'i Json) -> &'i str {
    || -> Option<&'i str> {
// (rest as before)

Voila, this should now compile without any issues.

Once again, “Keep calm and add more 'lifetimes” proves to be an effective approach ;)


  1. Technically, they aren’t called tuples there but “multiple return values“. 

  2. This is something I needed to do when rewriting this Python project of mine to Rust. 

  3. This is also the closest Rust can currently get to a do notation from Haskell, at least without any macro-based hacks. 

Continue reading

Optional arguments in Rust 1.12

Posted on Thu 29 September 2016 in Code • Tagged with Rust, arguments, parameters, functionsLeave a comment

Today’s announcement of Rust 1.12 contains, among other things, this innocous little tidbit:

Option implements From for its contained type

If you’re not very familiar with it, From is a basic converstion trait which any Rust type can implement. By doing so, it defines how to create its values from some other type — hence its name.

Perhaps the most widespread application of this trait (and its from method) is allocating owned String objects from literal str values:

let hello = String::from("Hello, world!");

What the change above means is that we can do similar thing with the Option type:

let maybe_int = Option::from(42);

At a first glance, this doesn’t look like a big deal at all. For one, this syntax is much more wordy than the traditional Some(42), so it’s not very clear what benefits it offers.

But this first impression is rather deceptive. In many cases, this change can actually reduce the number of times we have to type Some(x), allowing us to replace it with just x. That’s because this new impl brings Rust quite a bit closer to having optional function arguments as a first class feature in the language.

Until now, a function defined like this:

fn maybe_plus_5(x: Option<i32>) -> i32 {
    x.unwrap_or(0) + 5
}

was the closest Rust had to default argument values. While this works perfectly — and is bolstered by compile-time checks! — callers are unfortunately required to build the Option objects manually:

let _ = maybe_plus_5(Some(42));  // OK
let _ = maybe_plus_5(None);      // OK
let _ = maybe_plus_5(42);        // error!

After Option<T> implements From<T>, however, this can change for the better. Much better, in fact, for the last line above can be made valid. All that is necessary is to take advantage of this new impl in the function definition:

fn maybe_plus_5<T>(x: T) -> i32 where Option<i32>: From<T> {
    Option::from(x).unwrap_or(0) + 5
}

Unfortunately, this results in quite a bit of complexity, up to and including the where clause: a telltale sign of convoluted, generic code. Still, this trade-off may be well worth it, as a function defined once can be called many times throughout the code base, and possibly across multiple crates if it’s a part of the public API.

But we can do better than this. Indeed, using the From trait to constrain argument types is just complicating things for no good reason. What we should so instead is use the symmetrical trait, Into, and take advantage of its standard impl:

impl<T, U> Into<U> for T where U: From<T>

Once we translate it to the Option case (now that Option<T> implements From<T>), we can switch the trait bounds around and get rid of the where clause completely:

fn maybe_plus_5<T: Into<Option<i32>>>(x: T) -> i32 {
    x.into().unwrap_or(0) + 5
}

As a small bonus, the function body has also gotten a little simpler.


So, should you go wild and change all your functions taking Optionals to look like this?… Well, technically you can, although the benefits may not outweigh the downsides for small, private functions that are called infrequently.

On the other hand, if you can afford to only support Rust 1.12 and up, this technique can make it much more pleasant to use the external API of your crates.

What’s best is the full backward compatibility with any callers that still pass Some(x): for them, the old syntax will continue to work exactly like before. Also note that the Rust compiler is smart about eliding the no-op conversion calls like the Into::into above, so you shouldn’t observe any changes in the performance department either.

And who knows, maybe at some point Rust makes the final leap, and allows skipping the Nones?…

Continue reading

for loops in Rust

Posted on Tue 26 July 2016 in Code • Tagged with Rust, loops, iteratorsLeave a comment

In this post, I’m going to talk about the for loop construct in Rust, as well as the related concepts of iterators and “iterables”.

Depending on your programming language background, they may seem somewhat familiar in terms of syntax & semantics, or rather mysterious and surprising. Their closest analogues exist in Python, but programmers of Java, C#, or (modern) C++ should recognize many relevant features and ideas as well.

Basics

The syntax of a for loop is so modest it’s almost spartan1:

let v = vec!["1", "2", "3"];
for x in v {
    println!("{}", x);
}

As you would expect, this prints three lines with 1, 2, 3. What is probably not as obvious is that over the course of this loop the v vector was expended. Trying to use it after the iteration, we’ll get a borrow checker error:

<anon>:6:22: 6:23 error: use of moved value: `v` [E0382]
<anon>:4         println!("{}", x);
<anon>:5     }
<anon>:6     println!("{:?}", v);
                              ^

In Rust jargon, the vector has been moved into the loop. Its ownership — and that of its individual elements — has been transfered there permanently. While definitely surprising when compared to other languages, this behavior is consistent with Rust’s ubiquitous policy of moving values by default.

Still, you may not expect it here because moving ownership is mostly seen at the function call boundaries. For most intents and purposes, however, you can picture a for_each function like this to be the equivalent of the for loop above:

for_each(v, |x| println!("{}", x));

This also gives us a hint on how we could prevent the move from happening. Rather than taking the vector itself, the function could accept only a reference to it:

for_each_ref(&v, |x| println!("{}", x));

After we translate this back to the looping syntax:

 for x in &v {
    println!("{}", x);
}
println!("{:?}", v);

we won’t get any more objections from the compiler.

Iterators and “iterables” in Rust

It is important to emphasize that this new ampersand symbol (&) is by no means a part of the syntax of the for loop itself. We have actually changed what object we’re iterating here. It is no longer Vec<T> — a vector itself — but &Vec<T>, an immutable reference to it. As a consequence, x is not a T (the element type) anymore, but a &T — a reference to an element2.

So it seems that in Rust, both Vec<T> and &Vec<T> are what we would call “iterables”: collections (or other objects) that we can get iterate over. The usual way this is implemented in various programming languages is by introducing an iterator object.

The iterator keeps track of what element it’s currently pointing to and supports at least the following basic operations:

  • getting the current element
  • advancing to the next element
  • signaling when no more elements are available

Some languages provide separate iterator methods for each of those tasks, but Rust chooses to combine them all into one. You can see that when looking at the Iterator trait: next is the only method to be provided by its implementations.

Desugaring with into-iterators

How is the iterator object created, though?

In a typical Rust manner, this job is delegated to another trait. This one is called IntoIterator, and it roughly corresponds to the “iterable” concept I’ve alluded to earlier:

// (simplified)
trait IntoIterator {
    fn into_iter(self) -> Iterator;
}

What is uniquely Rusty is the fact that into_iter — the sole method of this trait — doesn’t just create a new iterator for the collection. Instead, it effectively consumes the whole thing, leaving the new iterator as its only remnant and the only way to access the items3.

This, of course, is a direct manifestation of the Rust’s move-by-default policy. In this case, it protects us from the common problem of iterator invalidation which is probably all-too-familiar to C++ programmers. Because the collection is essentially “converted” to an iterator here, it is impossible:

  • for more than one iterator to exist at a time
  • to modify the collection while any iterators are in scope

Doesn’t all this “moving” and “consuming” sound familiar, by the way? I’ve mentioned earlier that when we iterate over a vector with a for loop, we essentially move it “into the loop”.

As you can probably deduce by now, what really happens is that IntoIterator::into_iter is invoked on the vector. Its result — the iterator object — is then repeatedly nexted until it returns None.

In a way, a for loop like this:

for x in v {
    // body
}

is therefore nothing else but a syntactic sugar for the following expanded version:

let mut iter = IntoIterator::into_iter(v);
loop {
    match iter.next() {
        Some(x) => {
            // body
        },
        None => break,
    }
}

You can see quite clearly that v is unusable not only after the loop ends, but before it even begins. This is because it has been moved into iter — into an iterator — through an into_iter method… of IntoIterator!

Simple, huh? :)

for loop is just a syntactic sugar for an IntoIterator::into_iter invocation, followed by repeated calling of Iterator::next.

The ampersand

On a more serious note, this move isn’t something that we’d always want to happen. Fortunately, we know a way to prevent it. Rather than iterating over the vector itself, use a reference to it:

for x in &v {
    // body
}

The great thing about this syntax is that everything said above still applies, up to and including the desugaring procedure. The into_iter method is still being invoked, except that this time it is done on the reference to the collection&Vec<T> rather than Vec<T>:

// (simplified)
impl IntoIterator for &Vec<T> {
    fn into_iter(self) -> Iterator<Item=&T> { ... }
}

The result is therefore an iterator that yields references to the elements (&T), rather than elements themselves (T). And because self above is also a reference, the collection isn’t really moved anywhere, which is why we can freely access it after the loop ends.

The exact same thing happens when looping over a mutable reference:

for x in &mut v {
    // body
}

except that this time into_iter is called for &mut Vec<T>. Result is therefore of type Iterator<Item=&mut T>, enabling us to modify the elements as we go through them.

No further compiler machinery is required to support those two cases, because everything is already handled by the same trait.

The IntoIterator desugaring works the same way for collections and both immutable and mutable references to them.

What about the iter() method?

So far, we’ve talked about regular for loops, and the very imperative style of computation they represent.

If you are more inclined towards functional programming, though, you may have seen and written rather different constructs, combining various “fluent” methods into expressions such as this one:

let doubled_odds: Vec<_> = numbers.iter()
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

Methods like map and filter here are called iterator adapters, and they are all defined on the Iterator trait. Not only are they very powerful and numerous, but they can also be supplemented through several third-party crates.

In order to take advantage of the adapters, however, we need to obtain an iterator for our collection first. We know that into_iter is the way loops normally do it, so in principle we could follow the same approach here:

let doubled_odds: Vec<_> = IntoIterator::into_iter(&numbers)
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

To spare us the verbosity of this explicit syntax, collections normally offer an iter method which is exactly equivalent4. This method is what you will normally see in chained expressions like the one above.

v.iter() is just a shorthand for IntoIterator::into_iter(&v).

Why not both?

The last thing to note is that Rust mandates neither loops nor iterator adapters when writing code that operates on collections of elements. When optimizations are turned on in the release mode, both versions should compile to equally efficient machine code, with closures inlined and loops unrolled where necessary.

Choosing one style over the other is therefore a matter of convention and style. Sometimes the right choice may actually be a mix of both approaches, and Rust allows it without any complaints:

fn print_prime_numbers_upto(n: i32) {
    println!("Prime numbers lower than {}:", n);
    for x in (2..n).filter(|&i| is_prime(i)) {
        println!("{}", x);
    }
}

Like before, this is possible through the same for loop desugaring that involves the IntoIterator trait. In this case, Rust will simply use a no-op implementation of this trait, “converting” any existing Iterator into” itself.

Iterators themselves are also “iterables”, implementing IntoIterator::into_iter as a pass-through.

Looping around

If you want to know even more about iterators and loops in Rust, the best source at this point is probably just the official documentation. And although mastering all the iterator adapters is of course not necessary to write effective Rust code, taking a careful look at least at the collect method (and the associated FromIterator trait) is definitely helpful.


  1. The “two-semicolon” variant of the for loop doesn’t exist in Rust. Just like in Python, the equivalent is iterating over a range object, or using a regular while loop for more complex cases. 

  2. This shift is completely transparent in the loop’s body. The way it works is based on Rust’s special mechanism called the Deref coercions. Without going into too much detail (as it is way out of scope for this post), this feature allows us to treat references to objects (&T) as if they were the objects themselves (T). The compiler will perform the necessary derefencing where possible, or signal an error in case of a (rare) ambiguity. 

  3. How do we know that? It’s because into_iter takes self (rather than &self or &mut self) as its first parameter. It means that the entire object for which this method is called is moved into its body (hence the method’s name). 

  4. Curiously enough, this equivalence isn’t encoded in the type system in any way, making it technically just a convention. It is followed consistently at least in the standard library, though. 

Continue reading

& vs. ref in Rust patterns

Posted on Thu 02 June 2016 in Code • Tagged with Rust, pattern matching, borrowing, referencesLeave a comment

Rust is one of those nice languages with pattern matching. If you don’t know, it can be thought of as a generalization of the switch statement: comparing objects not just by value (or overloaded equality operator, etc.) but by structure:

match hashmap.get(&key) {
    Some(value) => do_something_with(value),
    None => { panic!("Oh noes!"); },
}

It doesn’t end here. As you can see above, objects can also be destructured during the match (Some(value)), their parts assigned to bindings (value), and those bindings can subsequently be used in the match branch.

Neat? Definitely. In Rust, pattern matching is bread-and-butter of not only the match statement, but also for, (if)let, and even ordinary function arguments.

Mixing in Rust semantics

For a long time, however, I was somewhat confused as to what happens when references and borrowing is involved in matching. The two “operators” that often occur there are & (ampersand) and ref.

You should readily recognize the first one, as it is used pervasively in Rust to create references (and reference types). The second one quite obviously hints towards references as well. Yet those two constructs serve very different purposes when used within a pattern.

To add to the confusion, they are quite often encountered together:

use hyper::Url;

// print query string params of some URL
let url = Url::parse(some_url).unwrap();
let query_params: Vec<(String, String)> = url.query_pairs().unwrap_or(vec![]);
for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Lack of one or the other will be (at best) pointed out to you by the compiler, along with a suggestion where to add it. But addressing problems in this manner can only go so far. So how about we delve deeper and see what it’s really about?

Part of the reference, part of the pattern

Rust is very flexible as to what value can be a subject of pattern matching. You would be very hard pressed to find anything that cannot be used within a match statement, really. Both actual objects and references to objects are perfectly allowed:

struct Foo(i32);
// ...
let foo = &Foo(42);
match foo {
    x => println!("Matched!"),
}

In the latter case, however, we aren’t typically interested in the reference itself (like above). Instead, we want to determine some facts about the object it points to:

match foo {
    &Foo(num) => println!("Matched with number {}", num),
}

As you can see, this is where the ampersand comes into play. Just like a type constructor (Some, Ok, or Foo above), the & operator informs the Rust compiler what kind of value we’re expecting from the match. When it sees the ampersand, it knows we’re looking for references to certain objects, and not for the objects themselves.

Why is the distinction between an object and its reference important, though? In many other places, Rust is perfectly happy to blur the gap between references and actual objects1 — for example when calling most of their methods.

Pattern matching, however, due to its ability to unpack values into their constituent parts, is a destructive operation. Anything we apply match (or similar construct) to will be moved into the block by default:

let maybe_name = Some(String::from("Alice"));
// ...
match maybe_name {
    Some(n) => println!("Hello, {}", n),
    _ => {},
}
do_something_with(maybe_name)

Following the typical ownership semantics, this will prevent any subsequent moves and essentially consume the value:

error: use of partially moved value: `maybe_name` [E0382]
    do_something_with(maybe_name);
                      ^~~~~~~~~~

So just like the aforementioned type constructors (Some, etc.), the ampersand operator is simply part of the pattern that we match against. And just like with Some and friends, there is an obvious symmetry here: if & was used to create the value, it needs to be used when unpacking it.

The syntax used in a pattern that destructures an object is analogous to one used by the expression which created it.

Preventing the move

Errors like the one above often contain helpful notes:

note: `(maybe_name:core::option::Option::Some).0` moved here because it has type `collections::string::String`, which is moved by default
         Some(n) => println!("Hello, {}", n),
              ^

as well as hints for resolving them:

help: if you would like to borrow the value instead, use a `ref` binding as shown:
        Some(ref n) => println!("Hello, {}", n),

Here’s where ref enters the scene.

The message tells us that if we add a ref keyword in the suggested spot, we will switch from moving to borrowing for the match binding that follows (here, n). It will still capture its value exactly as before, but it will no longer assume ownership of it.

This is the crucial difference.

Unlike the ampersand, ref is not something we match against. It doesn’t affect what values match the pattern it’s in, and what values don’t2.

The only thing it changes is how parts of the matched value are captured by the pattern’s bindings:

  • by default, without ref, they are moved into the match arms
  • with ref, they are borrowed instead and represented as references

Looking at our example, the n binding in Some(n) is of type String: the actual field type from the matched structure. By contrast, the other n in Some(ref n) is a &String — that is, a reference to the field.

One is a move, the other one is a borrow.

ref annotates pattern bindings to make them borrow rather than move. It is not a part of the pattern as far as matching is concerned.

Used together

To finish off, let’s untangle the confusing example from the beginning of this post:

for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Since we know ref doesn’t affect whether or not the pattern matches, we could just as well have something like &(a, b). And this should be quite a bit easier to read: it clearly denotes we expect a reference to a 2-tuple of simple objects. Not coincidentally, such tuples are items from the vector we’re iterating over.

Problem is, without the refs we will attempt to move those items into the loop scope. But due to the way the vector is iterated over (&query_params), we’re only borrowing each item, so this is actually impossible. In fact, it would be a classic attempt to move out of a borrowed context.

It is also wholly unnecessary. The only thing this loop does is printing the items out, so accessing them through references is perfectly fine.

And this is exactly what the ref operator gives us. Adding the keyword back, we will switch from moving the values to just borrowing them instead.

To sum up

  • & denotes that your pattern expects a reference to an object. Hence & is a part of said pattern: &Foo matches different objects than Foo does.

  • ref indicates that you want a reference to an unpacked value. It is not matched against: Foo(ref foo) matches the same objects as Foo(foo).


  1. The technical term for this is a Deref coercion

  2. We can say that it doesn’t affect the satisfiability (or conversely, refutability) of the pattern. 

Continue reading

Tricks with ownership in Rust

Posted on Mon 07 March 2016 in Code • Tagged with Rust, borrow checker, reference counting, traitsLeave a comment

…or how I learned to stop worrying and love the borrow checker.

Having no equivalents in other languages, the borrow checker is arguably the most difficult thing to come to terms with when learning Rust. It’s easy to understand why it’s immensely useful, especially if you recall the various vulnerabities stemming from memory mismanagement. But that knowledge doesn’t exactly help when the compiler is whining about what seems like a perfectly correct code.

Let’s face it: it will take some time to become productive writing efficient and safe code. It’s not entirely unlike adjusting to a different paradigm such as functional programming when you’ve been writing mostly imperative code. Before that happens, though, you can use some tricks to make the transition a little easier.

Just clone it

Ideally, we’d want our code to be both correct and fast. But if we cannot quite get to the “correctness” part yet — because our program doesn’t, you know, compile — then how about paying for it with a small (and refundable) performance hit?

This is where the clone method comes in handy. Many problems with the borrow checker stem from trying to spread object ownership too thin. It is a precious resource and it’s not very cheap to “produce”, which is why good Rust code often deals with just immutable or mutable references.

But if that proves difficult, then “making more objects” is a great intermediate solution. Incidentally, this is what higher level languages are doing all the time, and often transparently. To ease the transition to Rust from those languages, we can start off by replicating their behavior.

As an example, consider a function that tries to convert some value to String:

struct Error;

fn maybe_to_string<T>(v: T) -> Result<String, Error> {
    // omitted
}

If we attempt to build upon it and create a Vector version:

fn maybe_all_to_string<T>(v: Vec<T>) -> Result<Vec<String>, Error> {
    let results: Vec<_> = v.iter().map(maybe_to_string).collect();
    if let Some(res) = results.iter().find(|r| r.is_err()) {
        return Err(Error);
    }
    Ok(results.iter().map(|r| r.ok().unwrap()).collect())
}

then we’ll be unpleasantly surprised by a borrow checker error:

error: cannot move out of borrowed content [E0507]
    Ok(results.iter().map(|r| r.ok().unwrap()).collect())
                              ^

Much head scratching will ensue, and we may eventually find an idiomatic and efficient solution. However, a simple stepping stone in the shape of additional clone() call can help move things forward just a little quicker:

#[derive(Clone)]
struct Error;

// ...
Ok(results.iter().map(|r| r.clone().ok().unwrap()).collect())

The performance tradeoff is explicit, and easy to find later on with a simple grep clone\(\) or similar. When you learn to do things the Rusty way, it won’t be hard to go back to your “hack” and fix it properly.

Refcounting to the rescue

Adding clone() willy-nilly to make the code compile is a valid workaround when we’re just learning. Sometimes, however, even some gratuitous cloning doesn’t quite solve the problem, because the clone() itself can become an issue.

For one, it requires our objects to implement the Clone trait. This was apparent even in our previous example, since we had to add a #[derive(Clone)] attribute to the struct Error in order to make it clone-able.

Fortunately, in the vast majority of cases this will be all that’s necessary, as most built-in types in Rust implement Clone already. One notable exception are function traits (FnOnce, Fn, and FnMut) which are used to store and refer to closures1. Structures and other custom types that contain them (or those which may contain them) cannot therefore implement Clone through a simple #[derive] annotation:

/// A value that's either there already
/// or can be obtained by calling a function.
#[derive(Clone)]
enum LazyValue<T: Clone> {
    Immediate(T),
    Deferred(Fn() -> T),
}
error: the trait `core::marker::Sized` is not implemented for the type `core::ops::Fn() -> T + 'static` [E0277]
    #[derive(Clone)]
             ^~~~~

What can we do in this case, then? Well, there is yet another kind of performance concessions we can make, and this one will likely sound familiar if you’ve ever worked with a higher level language before. Instead of actually cloning an object, you can merely increment its reference counter. As the most rudimentary kind of garbage collection, this allows to safely share the object between multiple “owners”, where each can behave as if it had its own copy of it.

Rust’s pointer type that provides reference counting capabilities is called std::rc::Rc. Conceptually, it is analogous to std::shared_ptr from C++, and it similarly keeps the refcount updated when the pointer is “acquired” (clone-ed) and “released” (drop-ed). Because no data is moved around during either of those two operations, Rc can refer even to types whose size isn’t known at compilation time, like abstract closures:

use std::rc::Rc;

#[derive(Clone)]
enum LazyValue<T: Clone> {
    Immediate(T),
    Deferred(Rc<Fn() -> T>),
}

Wrapping them in Rc therefore makes them “cloneable”. They aren’t actually cloned, of course, but because of the inherent immutability of Rust types they will appear so to any outside observer2.

Move it!

Ultimately, most problems with the borrow checker boil down to unskillful mixing of the two ways you handle data in Rust. There is ownership, which is passed around by moving the values; and there is borrowing, which means operating on them through references.

When you try to switch from one to the other, some friction is bound to occur. Code that uses references, for example, has to be copiously sprinkled with & and &mut, and may sometimes require explicit lifetime annotations. All these have to be added or removed, and changes like that tend to propagate quite readily to the upper layers of the program’s logic.

Therefore it is generally preferable, if at all possible, to deal with data directly and not through references. To maintain efficiency, however, we need to learn how to move the objects through the various stages of our algorithms. It turns out it’s surprisingly easy to inadvertently borrow something, hindering the possibility of producing a moved value.

Take our first example. The intuitively named Vec::iter method produces an iterator that we can map over, but does it really go over the actual items in the vector? Nope! It gives us a reference to each one — a borrow, if you will — which is exactly why we originally had to use clone to get out of this bind.

Instead, why not just get the elements themselves, by moving them out of the vector? Vec::into_iter allows to do exactly this:

Ok(results.into_iter().map(|r| r.ok().unwrap()).collect())

and enables us to remove the clone() call. The family of similar into_X (or even just into) methods can be reliably counted on at least in the standard library. They are also part of a more-or-less official naming convention that you should also follow in your own code.


  1. Note how this is different from function types, i.e. fn(A, B, C, ...) -> Ret. It is because plain functions do not carry their closure environments along with them. This makes them little more than just pointers to some code, and those can be freely Clone-d (or even Copy-ed). 

  2. If you want both shared ownership (“fake cloneability”) and the ability to mutate the shared value, take a look at the RefCell type and how it can be wrapped in Rc to achieve both. 

Continue reading

Moving out of a container in Rust

Posted on Fri 05 February 2016 in Code • Tagged with Rust, vector, borrow checker, referencesLeave a comment

To prevent the kind of memory errors that plagues many C programs, the borrow checker in Rust tracks how data is moved between variables, or accessed via references. This is all done at compile time, with zero runtime overhead, and is a sizeable part of Rust’s value offering.

Like all rigid and automated systems, however, it is necessarily constrained and cannot handle all situations perfectly. One of its limitations is treating all objects as atomic. It’s impossible for a variable to own a part of some bigger structure, neither is it possible to maintain mutable references to two or more elements of a collection.

If we nonetheless try:

fn get_name() -> String {
    let names = vec!["John".to_owned(), "Smith".to_owned()];
    join(names[0], names[1])
}

fn join(a: String, b: String) -> String {
    a + " " + &b
}

we’ll be served with a classic borrow checker error:

<anon>:3:25: 3:33 error: cannot move out of indexed content [E0507]
<anon>:3     let fullname = join(names[0], names[1]);
                                 ^~~~~~~~

Behind its rather cryptic verbiage, it informs us that we tried to move a part of the names vector — its first element — to a new variable (here, a function parameter). This isn’t allowed, because in principle it would render the vector invalid from the standpoint of strict memory safety. Rust would no longer guarantee names[0] to be a legal String: its internal pointer could’ve been invalidated by the code which the element moved to (the join function)1.

But while commendable, this guarantee isn’t exactly useful here. Even though names[0] would technically be invalid, there isn’t anyone to actually notice this fact. The names vector is inaccessible outside of the function it’s defined in, and even the function itself doesn’t look at it after the move. In its present form, the program is inarguably correct2 could’ve been accepted if partial moves from Vec were allowed by the borrow checker.

Pointers to the rescue?

Vectors wouldn’t be very useful or efficient, though, if we could only obtain copies or clones of their elements. As this is an inherent limitation of Rust’s memory model, and applies to all compound types (structs, hashmaps, etc.), it’s been recognized and countermeasures are available.

However, the idiomatic practice is to actually leave the elements be and access them solely through references:

fn get_name() -> String {
    let names = vec!["John".to_owned(), "Smith".to_owned()];
    join(&names[0], &names[1])
}

fn join(a: &String, b: &String) -> String {
    a.clone() + " " + b
}

The obvious downside of this approach is that it requires an interface change to join: it now has to accept pointers instead of actual objects3. And since the result is a completely new String, we have to either bite the bullet and clone, or write a more awkward join_into(a: &mut String, b: &String) function.
In general, making an API switch from actual objects to references has an annoying tendency to percolate up the call stacks and abstraction layers.

Vector solution

If we still insist on moving the elements out, at least in case of vector we aren’t completely out of luck. The Vec type offers several specialized methods that can slice, dice, and splice the collection in various ways. Those include:

  • split_first (and split_first_mut) for cutting right after the first element
  • split_last (and split_last_mut) for a similar cut right before the last element
  • split_at (and split_at_mut), generalized versions of the above methods
  • split_off, a partially-in-place version of split_at_mut
  • drain for moving all elements from a specified range

Other types may offer different methods, depending on their particular data layout, though drain should be available on any data structure that can be iterated over.

Structural advantage

What about user-defined types, such as structs?

Fortunately, these are covered by the compiler itself. Since accessing struct fields is a fully compile-time operation, it is possible to track the ownership of each individual object that makes up the structure. Thus there are no obstacles to simply moving all the fields:

struct Person {
    first_name: String,
    last_name: String,
}

fn get_name() -> String {
    let p = Person{first_name: "John".to_owned(),
                   last_name: "Smith".to_owned()};
    join(p.first_name, p.last_name)
}

If all else fails…

This leaves us with some rare cases when the container’s interface doesn’t quite support the exact subset of elements we want to move out. If we don’t want to drain them all and inspect every item for potential preservation, it may be time to skirt around the more dangerous areas of the language.

But I don’t necessarily mean going all out with unsafe blocks, pointers, and (let’s be honest) segfaults. Instead, we can look at the gray zone between them and the regular, borrow-checked Rust code.

Some of the functions inside the std::mem module can be said to fall into this category. Most notably, mem::swap and mem::replace allow us to operate directly on the memory blocks that back every Rust object, albeit without the dangerous ability to freely modify them.

What those functions enable is a small sleight of hand — a quick exchange of two variables or objects while the borrow checker “isn’t looking”. Possessing such an ability, we can smuggle any item out of a container as long as we’re able to provide a suitable replacement:

use std::mem;

/// Pick only the items under indices that are powers of two.
fn pick_powers_of_2<T: Default>(mut v: Vec<T>) -> Vec<T> {
    let mut result: Vec<T> = Vec::new();
    let mut i = 1;
    while i < v.len() {
        let elem = mem::replace(&mut v[i], T::default());
        result.push(elem);
        i *= 2;
    }
    result
}

Swap!
Pictured: implementation of mem::replace.

The Default value, if available, is usually a great choice here. Alternately, a Copy or Clone of some other element can also work if it’s cheap to obtain.


  1. In Rust jargon, it is sometimes said that the object has been “consumed” there. 

  2. As /u/Gankro points out on /r/rust, since Vec isn’t a part of the language itself, it doesn’t get to bend the borrow checking rules. Therefore speaking of counterfactual correctness is a bit too far-fetched in this case. 

  3. For Strings specifically, the usual practice is to require a more generic &str type (string slice) instead of &String

Continue reading

Rust: first impressions

Posted on Thu 10 December 2015 in Code • Tagged with Rust, pointers, types, FP, OOP, traitsLeave a comment

Having recently been writing some C++ code at work, I had once again experienced the kind of exasperation that this cumbersome language evokes on regular basis. When I was working in it less sporadically, I was shrugging it off and telling myself it’s all because of the low level it operates on. Superior performance was the other side of the deal, and it was supposed to make all the trade-offs worthwhile.

Now, however, I realized that running close to the metal by no means excuses the sort of clunkiness that C++ permits. For example, there really is no reason why the archaically asinine separation of header & source files — with its inevitable redundancy of declarations and definitions, worked around with Java-esque contraptions such as pimpl — is still the bread and butter of C++ programs.
Same goes for the lack of sane dependency management, or a universal, portable build system. None of those would be at odds with native compilation to machine code, or runtime speeds that are adequate for real-time programs.

Rather than dwelling on those gripes, I thought it’d be more productive to look around and see what’s the modern offerring in the domain of lower level, really fast languages. The search wasn’t long at all, because right now it seems there is just one viable contender: Rust1.

Rusty systems

Rust introduces itself as a “systems programming language”, which is quite a bold claim. What followed the last time this phrase has been applied to an emerging language — Go — was a kind of word twisting that’s more indicative of politics, not computer science.

But Rust’s pretense to the system level is well justified. It clearly provides the requisite toolkit for working directly with the hardware, be it embedded controllers or fully featured computers. It offers compilation to native machine code; direct memory access; running time guarantees thanks to the lack of GC-incuded stops; and great interoperability through static and dynamic linkage.

In short, with Rust you can wreak havoc against the RAM and twiddle bits to your heart’s content.

Safe and sound

To be fair, though, the “havoc” part is not entirely accurate. Despite its focus on the low level, efficient computing, Rust aims to be a very safe language. Unlike C, it actively tries to prevent the programmer from shooting themselves in the foot — though it will hand you the gun if you but ask for it.

The safety guarantees provided by Rust apply to resource management, with the specific emphasis on memory and pointers to it. The way that most contemporary languages deal with memory is by introducing a garbage collector which mostly (though not wholly) relieves the programmer from thinking about allocations and deallocations. However, the kind of global, stop-the-world garbage collections (e.g. mark-and-sweep) is costly and unpredictable, ruling it out as a mechanism for real-time systems.

For this reason, Rust doesn’t mandate a GC of this kind2. And although it offers mechanisms that are similar to smart pointers from C++ (e.g. std::shared_ptr), it is actually preferable and safer to use regular, “naked” pointers: &Foo versus Cell<Foo> or RefCell<Foo> (which are some of the Rust’s “smart pointer” types).

The trick is in the clever compiler. As long as we use regular pointers, it is capable of detecting potential memory bugs at compilation time. They are referred to as “data races” in Rust’s terminology, and include perennial problems that will segfault any C code which wasn’t written with utmost care.

Part of those safety guarantees is also the default immutability of references (pointers). The simplest reference of type &Foo in Rust translates to something like const Foo * const in C3. You have to explicitly request mutability with the mut keyword, and Rust ensures there is always at most one mutable reference to any value, thus preventing problems caused by pointer aliasing.

But what if you really must sling raw pointers, and access arbitrary memory locations? Maybe you are programming a microcontroller where I/O is done through a special memory region. For those occasions, Rust has got you covered with the unsafe keyword:

// Read the state of a diode in some imaginary uC.
fn get_led_state(i: isize) -> bool {
    assert!(i >= 0 && i <= 4, "There are FOUR lights!");
    let p: *const u8 = 0x1234 as *const u8;  // known memory location
    unsafe { *p .offset(i) != 0 }
}

Its usage, like in the above example, can be very localized, limited only to those places where it’s truly necessary and guarded by the appropriate checks. As a result, the interface exposed by the above function can be considered safe. The unrestricted memory access can be contained to where it’s really inevitable.

Typing counts

Ensuring memory safety is not the only way in which Rust differentiates itself from C. What separates those two languages is also a few decades of practice and research into programming semantics. It’s only natural to expect Rust to take advantage of this progress.

And advantage it takes. Although Rust’s type system isn’t nearly as advanced and complex like — say — Scala’s, it exhibits several interesting properties that are indicative of its relatively modern origin.

First, it mixes the two most popular programming paradigms — functional and object-oriented — in roughly equal concentrations, as opposed to being biased towards the latter. Rust doesn’t have interfaces or classes: it has traits and their implementations. Even though they often fulfill similar purposes of abstraction and encapsulation, these constructs are closer to the concepts of type classes and their instances, which are found for example in Haskell.

Still, the more familiar notions of OOP aren’t too far off. Most of the key functionality of classes, for example, can be simulated by implementing “default” traits for user-defined types:

struct Person {
    first_name: String,
    last_name: String,
}

impl Person {
    fn new(first_name: &str, last_name: &str) -> Person {
        Person {
            first_name: first_name.to_string(),
            last_name: last_name.to_string(),
        }
    }

    fn greet(&self) {
        println!("Hello, {}!", self.first_name);
    }
}

// usage
let p = Person::new("John", "Doe");
p.greet();

The second aspect of Rust’s type system that we would come to expect from a new language is its expressive power. Type inference is nowadays a staple, and above we can observe the simplest form of it. But it extends further, to generic parameters, closure arguments, and closure return values.

Generics, by the way, are quite nice as well. Besides their applicability to structs, type aliases, functions, traits, trait implementations, etc., they allow for constraining their arguments with traits. This is similar to the abandoned-and-not-quite-revived-yet idea of concepts in C++, or to an analogous mechanism from C#.

The third common trend in contemporary language design is the use of type system to solve common tasks. Rust doesn’t go full Haskell and opt for monads for everything, but its Option and Result types are evidently the functional approach to error handling4. To facilitate their use, a powerful pattern matching facility is also present in Rust.

Unexpectedly pythonic

If your general go-to language is Python, you will find Rust a very nice complement and possibly a valuable instrument in your coding arsenal. Interoperability between Python and Rust is stupidly easy, thanks to both the ctypes module and the extreme simplicity of creating portable, shared libraries in Rust. Offloading some expensive, GIL-bypassing computation to a fast, native code written in Rust can thus be a relatively painless way of speeding up crucial parts of a Python program.

But somewhat more surprisingly, Rust has quite a few bits that seem to be directly inspired by Python semantics. Granted, those two languages are conceptually pretty far apart in general, but the analogies are there:

  • The concept of iterators in Rust is very similar to iterables in Python. Even the for loop is basically identical: rather than manually increment a counter, both in Rust and Python you iterate over a range of numbers.
    Oh, and both languages have an enumerate method/ function that yields pairs of (index, element).

  • Syntax for method definition in Rust uses the self keyword as first argument to distinguish between instance methods and “class”/”static” methods (or associated functions in Rust’s parlance). This is even more pythonic than in actual Python, where self is technically just a convention, albeit an extremely strong one.

  • In either language, overloading operators doesn’t use any new keywords or special syntax, like it does in C++, C#, and others. Python accomplishes it through __magic__ methods, whereas Rust has very similarly named operator traits.

  • Rust basically has doctest. If you don’t know, the doctest module is a standard Python testing utility that can run usage examples found in documentation comments and verify their correctness. Rust version (rustdoc) is even more powerful and flexible, allowing for example to mark additional boilerplate lines that should be run when testing examples, but not included in the generated documentation.

I’m sure the list doesn’t end here and will grow over time. As of this writing, for example, nightly builds of Rust already offer advanced slice pattern matching which are very similar to the extended iterable unpacking from Python 3.

Is it worth it?

Depending on your background and the programming domain you are working in, you may be wondering if Rust is a language that’s worth looking into now, or in the near future.

Firstly, let me emphasize that it’s still in its early stages. Although the stable version 1.0 has been released a good couple of months ago, the ecosystem isn’t nearly as diverse and abundant as in some of the other new languages.

If you are specifically looking to deploying Rust-written API servers, backends, and other — shall I use the word — microservices, then right now you’ll probably be better served by more established solutions, like Java with fibers, asynchronous Python on PyPy, Erlang, Go, node.js, or similar. I predict Rust catching up here in the coming months, though, because the prospect of writing native speed JSON slingers with relative ease is just too compelling to pass.

The other interesting area for Rust is game programming, because it’s one of the few languages capable of supporting even the most demanding AAA+ productions. The good news is that portable, open source game engines are already here. The bad news is that most of the existing knowledge about designing and coding high performance games is geared towards writing (stripped down) C++. The community is also rather stubborn reluctant to adopt anything that may carry even a hint of potentially unknown performance implications. Although some inroads have been made (here’s, for example, an entity component system written in Rust), and I wouldn’t be surprised to see indie games written in Rust, it probably won’t take over the industry anytime soon.

When it comes to hardware, though, Rust may already have the upper hand. It is obviously much easier language to program in than pure C. Along with its toolchain’s ability to produce minimal executables, it makes for a compelling language for programming microcontrollers and other embedded devices.

So in short, Rust is pretty nice. And if you have read that far, I think you should just go ahead and have a look for yourself :)


  1. Because as much as we’d like for D to finally get somewhere, at this point we may have better luck waiting for the Year of Linux on Desktop to dawn… 

  2. Of course, nobody has stopped the community from implementing it

  3. Strictly speaking, it’s the binding such as let x = &foo; that translates to it. Unadorned C pointer type Foo* would correspond to mutable binding to a mutable reference in Rust, i.e. let mut x = &mut foo;

  4. Their Haskell equivalents are Maybe and Either type classes, respectively. 

Continue reading