Karol Kuczmarski's Blog

Iteration patterns for Result & Option

Posted on Mon 10 April 2017 in Code • Tagged with Rust, iterators • Leave a comment

Towards the end of my previous post about for loops in Rust, I mentioned how those loops can often be expressed in a more declarative way. This alternative approach involves chaining methods of the Iterator trait to create specialized transformation pipelines:

let odds_squared: Vec<_> = (1..100)
    .filter(|x| x % 2 != 0)
    .map(|x| x * x)
    .collect();

Playground link

Code like this isn’t unique to Rust, of course. Similar patterns are prevalent in functional languages such as F#, and can also be found in Java (Streams), imperative .NET (LINQ), JavaScript (LoDash) and elsewhere.

This saying, Rust also has its fair share of unique iteration idioms. In this post, we’re going to explore those arising on the intersection of iterators and the most common Rust enums: Result and Option.

filter_map()

When working with iterators, we’re almost always interested in selecting elements that match some criterion or passing them through a transformation function. It’s not even uncommon to want both of those things, as demonstrated by the initial example in this post.

You can, of course, accomplish those two tasks independently: Rust’s filter and map methods work just fine for this purpose. But there exists an alternative, and in some cases it fits the problem amazingly well.

Meet filter_map. Here’s what the official docs have to say about it:

Creates an iterator that both filters and maps.

Well, duh.

On a more serious note, the common pattern that filter_map simplifies is unwrapping a series of Options. If you have a sequence of maybe-values, and you want to retain only those that are actually there, filter_map can do it in a single step:

// Get the sequence of all files matching a glob pattern via the glob crate.
let some_files = glob::glob("foo.*").unwrap().map(|x| x.unwrap());
// Retain only their extensions, e.g. ".txt" or ".md".
let file_extensions = some_files.filter_map(|p| p.extension());

The equivalent that doesn’t use filter_map would have to split the checking & unwrapping of Options into separate steps:

let file_extensions = some_files.map(|p| p.extension())
    .filter(|e| e.is_some()).map(|e| e.unwrap());

Because of this check & unwrap logic, filter_map can be useful even with a no-op predicate (.filter_map(|x| x)) if we already have the Option objects handy. Otherwise, it’s often very easy to obtain them, which is exactly the case for the Result type:

// Read all text lines from a file:
let lines: Vec<_> = BufReader::new(fs::File::open("file.ext")?)
    .lines().filter_map(Result::ok).collect();

With a simple .filter_map(Result::ok), like above, we can pass through a sequence of Results and yield only the “successful” values. I find this particular idiom to be extremely useful in practice, as long as you remember that Errors will be discarded by it¹.

As a final note on filter_map, you need to keep in mind that regardless of how great it often is, not all combinations of filter and map should be replaced by it. When deciding whether it’s appropriate in your case, it is helpful to consider the equivalence of these two expressions:

iter.filter(f).map(m)
iter.filter_map(|x| if f(x) { Some(m(x)) } else { None })

Simply put, if you find yourself writing conditions like this inside filter_map, you’re probably better off with two separate processing steps.

collect()

Let’s go back to the last example with a sequence of Results. Since the final sequence won’t include any Erroneous values, you may be wondering if there is a way to preserve them.

In more formal terms, the question is about turning a vector of results (Vec<Result<T, E>>) into a result with a vector (Result<Vec<T>, E>). We’d like for this aggregated result to only be Ok if all original results were Ok. Otherwise, we should just get the first Error.

Believe it or not, but this is probably the most common Rust problem!²

Of course, that doesn’t necessarily mean the problem is particularly hard. Possible solutions exist in both an iterator version:

let result = results.into_iter().fold(Ok(vec![]), |mut v, r| match r {
    Ok(x) => { v.as_mut().map(|v| v.push(x)); v },
    Err(e) => Err(e),
});

and in a loop form:

let mut result = Ok(vec![]);
for r in results {
    match r {
        Ok(x) => result.as_mut().map(|v| v.push(x)),
        Err(e) => { result = Err(e); break; },
    };
}

but I suspect not many people would call them clear and readable, let alone pretty³.

Fortunately, you don’t need to pollute your codebase with any of those workarounds. Rust offers an out-of-the-box solution which solves this particular problem, and its only flaw is one that I hope to address through this very post.

So, here it goes:

let result: Result<Vec<_>, _> = results.collect();

Yep, that’s all of it.

The background story is that Result<Vec<T>, E> simply “knows” how to construct itself from a sequence of Results. Unfortunately, this API is hidden behind Rust’s iterator abstraction, and specifically the fact that Result implements FromIterator in this particular manner. The way the documentation page for Result is structured, however — with trait implementations at the very end — ensures this useful fact remains virtually undiscoverable.

Because let’s be honest: no one scrolls that far.

Incidentally, Option offers analogous functionally: a sequence of Option<T> can be collected into Option<Vec<T>>, which will be None if any of the input elements were. As you may suspect, this fact is equally hard to find in the relevant docs.

But the good news is: you know about all this now! :) And perhaps thanks to this post, those handy tricks become a little better in a wider Rust community.

partition()

The last technique I wanted to present here follows naturally from the other idioms that apply to Results. Instead of extracting just the Ok values with flat_map, or keeping only the first error through collect, we will now learn how to retain all the errors and all the values, both neatly separated.

The partition method, as this is what the section is about, is essentially a more powerful variant of filter. While the latter only returns items that do match a predicate, partition will also give us the ones which don’t.

Using it to slice an iterable of Results is straightforward:

let (oks, fails): (Vec<_>, Vec<_>) = results.partition(Result::is_ok);

The only thing that remains cumbersome is the fact that both parts of the resulting tuple still contain just Results. Ideally, we would like them to be already unwrapped into values and errors, but unfortunately we need to do this ourselves:

let values: Vec<_> = oks.into_iter().map(Result::unwrap).collect();
let errors: Vec<_> = fails.into_iter().map(Result::unwrap_err).collect();

As an alternative, the partition_map method from the itertools crate can accomplish the same thing in a single step, albeit a more verbose one.

A symmetrical technique is to use .filter_map(Result::err) to get just the Error objects, but that’s probably much less useful as it drops all the successful values. ↩
Based on my completely unsystematic and anecdotal observations, someone asks about this on the #rust-beginners IRC approximately every other day. ↩
The fold variant is also rife with type inference traps, often requiring explicit type annotations, a “no-op” Err arm in match, or both. ↩

for loops in Rust

Posted on Tue 26 July 2016 in Code • Tagged with Rust, loops, iterators • Leave a comment

In this post, I’m going to talk about the for loop construct in Rust, as well as the related concepts of iterators and “iterables”.

Depending on your programming language background, they may seem somewhat familiar in terms of syntax & semantics, or rather mysterious and surprising. Their closest analogues exist in Python, but programmers of Java, C#, or (modern) C++ should recognize many relevant features and ideas as well.

Basics

The syntax of a for loop is so modest it’s almost spartan¹:

let v = vec!["1", "2", "3"];
for x in v {
    println!("{}", x);
}

As you would expect, this prints three lines with 1, 2, 3. What is probably not as obvious is that over the course of this loop the v vector was expended. Trying to use it after the iteration, we’ll get a borrow checker error:

<anon>:6:22: 6:23 error: use of moved value: `v` [E0382]
<anon>:4         println!("{}", x);
<anon>:5     }
<anon>:6     println!("{:?}", v);
                              ^

In Rust jargon, the vector has been moved into the loop. Its ownership — and that of its individual elements — has been transfered there permanently. While definitely surprising when compared to other languages, this behavior is consistent with Rust’s ubiquitous policy of moving values by default.

Still, you may not expect it here because moving ownership is mostly seen at the function call boundaries. For most intents and purposes, however, you can picture a for_each function like this to be the equivalent of the for loop above:

for_each(v, |x| println!("{}", x));

This also gives us a hint on how we could prevent the move from happening. Rather than taking the vector itself, the function could accept only a reference to it:

for_each_ref(&v, |x| println!("{}", x));

After we translate this back to the looping syntax:

 for x in &v {
    println!("{}", x);
}
println!("{:?}", v);

we won’t get any more objections from the compiler.

Iterators and “iterables” in Rust

It is important to emphasize that this new ampersand symbol (&) is by no means a part of the syntax of the for loop itself. We have actually changed what object we’re iterating here. It is no longer Vec<T> — a vector itself — but &Vec<T>, an immutable reference to it. As a consequence, x is not a T (the element type) anymore, but a &T — a reference to an element².

So it seems that in Rust, both Vec<T> and &Vec<T> are what we would call “iterables”: collections (or other objects) that we can get iterate over. The usual way this is implemented in various programming languages is by introducing an iterator object.

The iterator keeps track of what element it’s currently pointing to and supports at least the following basic operations:

getting the current element
advancing to the next element
signaling when no more elements are available

Some languages provide separate iterator methods for each of those tasks, but Rust chooses to combine them all into one. You can see that when looking at the Iterator trait: next is the only method to be provided by its implementations.

Desugaring with into-iterators

How is the iterator object created, though?

In a typical Rust manner, this job is delegated to another trait. This one is called IntoIterator, and it roughly corresponds to the “iterable” concept I’ve alluded to earlier:

// (simplified)
trait IntoIterator {
    fn into_iter(self) -> Iterator;
}

What is uniquely Rusty is the fact that into_iter — the sole method of this trait — doesn’t just create a new iterator for the collection. Instead, it effectively consumes the whole thing, leaving the new iterator as its only remnant and the only way to access the items³.

This, of course, is a direct manifestation of the Rust’s move-by-default policy. In this case, it protects us from the common problem of iterator invalidation which is probably all-too-familiar to C++ programmers. Because the collection is essentially “converted” to an iterator here, it is impossible:

for more than one iterator to exist at a time
to modify the collection while any iterators are in scope

Doesn’t all this “moving” and “consuming” sound familiar, by the way? I’ve mentioned earlier that when we iterate over a vector with a for loop, we essentially move it “into the loop”.

As you can probably deduce by now, what really happens is that IntoIterator::into_iter is invoked on the vector. Its result — the iterator object — is then repeatedly next‘ed until it returns None.

In a way, a for loop like this:

for x in v {
    // body
}

is therefore nothing else but a syntactic sugar for the following expanded version:

let mut iter = IntoIterator::into_iter(v);
loop {
    match iter.next() {
        Some(x) => {
            // body
        },
        None => break,
    }
}

You can see quite clearly that v is unusable not only after the loop ends, but before it even begins. This is because it has been moved into iter — into an iterator — through an into_iter method… of IntoIterator!

Simple, huh? :)

for loop is just a syntactic sugar for an IntoIterator::into_iter invocation, followed by repeated calling of Iterator::next.

The ampersand

On a more serious note, this move isn’t something that we’d always want to happen. Fortunately, we know a way to prevent it. Rather than iterating over the vector itself, use a reference to it:

for x in &v {
    // body
}

The great thing about this syntax is that everything said above still applies, up to and including the desugaring procedure. The into_iter method is still being invoked, except that this time it is done on the reference to the collection — &Vec<T> rather than Vec<T>:

// (simplified)
impl IntoIterator for &Vec<T> {
    fn into_iter(self) -> Iterator<Item=&T> { ... }
}

The result is therefore an iterator that yields references to the elements (&T), rather than elements themselves (T). And because self above is also a reference, the collection isn’t really moved anywhere, which is why we can freely access it after the loop ends.

The exact same thing happens when looping over a mutable reference:

for x in &mut v {
    // body
}

except that this time into_iter is called for &mut Vec<T>. Result is therefore of type Iterator<Item=&mut T>, enabling us to modify the elements as we go through them.

No further compiler machinery is required to support those two cases, because everything is already handled by the same trait.

The IntoIterator desugaring works the same way for collections and both immutable and mutable references to them.

What about the iter() method?

So far, we’ve talked about regular for loops, and the very imperative style of computation they represent.

If you are more inclined towards functional programming, though, you may have seen and written rather different constructs, combining various “fluent” methods into expressions such as this one:

let doubled_odds: Vec<_> = numbers.iter()
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

Methods like map and filter here are called iterator adapters, and they are all defined on the Iterator trait. Not only are they very powerful and numerous, but they can also be supplemented through several third-party crates.

In order to take advantage of the adapters, however, we need to obtain an iterator for our collection first. We know that into_iter is the way loops normally do it, so in principle we could follow the same approach here:

let doubled_odds: Vec<_> = IntoIterator::into_iter(&numbers)
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

To spare us the verbosity of this explicit syntax, collections normally offer an iter method which is exactly equivalent⁴. This method is what you will normally see in chained expressions like the one above.

v.iter() is just a shorthand for IntoIterator::into_iter(&v).

Why not both?

The last thing to note is that Rust mandates neither loops nor iterator adapters when writing code that operates on collections of elements. When optimizations are turned on in the release mode, both versions should compile to equally efficient machine code, with closures inlined and loops unrolled where necessary.

Choosing one style over the other is therefore a matter of convention and style. Sometimes the right choice may actually be a mix of both approaches, and Rust allows it without any complaints:

fn print_prime_numbers_upto(n: i32) {
    println!("Prime numbers lower than {}:", n);
    for x in (2..n).filter(|&i| is_prime(i)) {
        println!("{}", x);
    }
}

Like before, this is possible through the same for loop desugaring that involves the IntoIterator trait. In this case, Rust will simply use a no-op implementation of this trait, “converting” any existing Iterator “into” itself.

Iterators themselves are also “iterables”, implementing IntoIterator::into_iter as a pass-through.

Looping around

If you want to know even more about iterators and loops in Rust, the best source at this point is probably just the official documentation. And although mastering all the iterator adapters is of course not necessary to write effective Rust code, taking a careful look at least at the collect method (and the associated FromIterator trait) is definitely helpful.

The “two-semicolon” variant of the for loop doesn’t exist in Rust. Just like in Python, the equivalent is iterating over a range object, or using a regular while loop for more complex cases. ↩
This shift is completely transparent in the loop’s body. The way it works is based on Rust’s special mechanism called the Deref coercions. Without going into too much detail (as it is way out of scope for this post), this feature allows us to treat references to objects (&T) as if they were the objects themselves (T). The compiler will perform the necessary derefencing where possible, or signal an error in case of a (rare) ambiguity. ↩
How do we know that? It’s because into_iter takes self (rather than &self or &mut self) as its first parameter. It means that the entire object for which this method is called is moved into its body (hence the method’s name). ↩
Curiously enough, this equivalence isn’t encoded in the type system in any way, making it technically just a convention. It is followed consistently at least in the standard library, though. ↩