for loops in Rust

Posted on Tue 26 July 2016 in Code • Tagged with Rust, loops, iteratorsLeave a comment

In this post, I’m going to talk about the for loop construct in Rust, as well as the related concepts of iterators and “iterables”.

Depending on your programming language background, they may seem somewhat familiar in terms of syntax & semantics, or rather mysterious and surprising. Their closest analogues exist in Python, but programmers of Java, C#, or (modern) C++ should recognize many relevant features and ideas as well.

Basics

The syntax of a for loop is so modest it’s almost spartan1:

let v = vec!["1", "2", "3"];
for x in v {
    println!("{}", x);
}

As you would expect, this prints three lines with 1, 2, 3. What is probably not as obvious is that over the course of this loop the v vector was expended. Trying to use it after the iteration, we’ll get a borrow checker error:

<anon>:6:22: 6:23 error: use of moved value: `v` [E0382]
<anon>:4         println!("{}", x);
<anon>:5     }
<anon>:6     println!("{:?}", v);
                              ^

In Rust jargon, the vector has been moved into the loop. Its ownership — and that of its individual elements — has been transfered there permanently. While definitely surprising when compared to other languages, this behavior is consistent with Rust’s ubiquitous policy of moving values by default.

Still, you may not expect it here because moving ownership is mostly seen at the function call boundaries. For most intents and purposes, however, you can picture a for_each function like this to be the equivalent of the for loop above:

for_each(v, |x| println!("{}", x));

This also gives us a hint on how we could prevent the move from happening. Rather than taking the vector itself, the function could accept only a reference to it:

for_each_ref(&v, |x| println!("{}", x));

After we translate this back to the looping syntax:

 for x in &v {
    println!("{}", x);
}
println!("{:?}", v);

we won’t get any more objections from the compiler.

Iterators and “iterables” in Rust

It is important to emphasize that this new ampersand symbol (&) is by no means a part of the syntax of the for loop itself. We have actually changed what object we’re iterating here. It is no longer Vec<T> — a vector itself — but &Vec<T>, an immutable reference to it. As a consequence, x is not a T (the element type) anymore, but a &T — a reference to an element2.

So it seems that in Rust, both Vec<T> and &Vec<T> are what we would call “iterables”: collections (or other objects) that we can get iterate over. The usual way this is implemented in various programming languages is by introducing an iterator object.

The iterator keeps track of what element it’s currently pointing to and supports at least the following basic operations:

  • getting the current element
  • advancing to the next element
  • signaling when no more elements are available

Some languages provide separate iterator methods for each of those tasks, but Rust chooses to combine them all into one. You can see that when looking at the Iterator trait: next is the only method to be provided by its implementations.

Desugaring with into-iterators

How is the iterator object created, though?

In a typical Rust manner, this job is delegated to another trait. This one is called IntoIterator, and it roughly corresponds to the “iterable” concept I’ve alluded to earlier:

// (simplified)
trait IntoIterator {
    fn into_iter(self) -> Iterator;
}

What is uniquely Rusty is the fact that into_iter — the sole method of this trait — doesn’t just create a new iterator for the collection. Instead, it effectively consumes the whole thing, leaving the new iterator as its only remnant and the only way to access the items3.

This, of course, is a direct manifestation of the Rust’s move-by-default policy. In this case, it protects us from the common problem of iterator invalidation which is probably all-too-familiar to C++ programmers. Because the collection is essentially “converted” to an iterator here, it is impossible:

  • for more than one iterator to exist at a time
  • to modify the collection while any iterators are in scope

Doesn’t all this “moving” and “consuming” sound familiar, by the way? I’ve mentioned earlier that when we iterate over a vector with a for loop, we essentially move it “into the loop”.

As you can probably deduce by now, what really happens is that IntoIterator::into_iter is invoked on the vector. Its result — the iterator object — is then repeatedly nexted until it returns None.

In a way, a for loop like this:

for x in v {
    // body
}

is therefore nothing else but a syntactic sugar for the following expanded version:

let mut iter = IntoIterator::into_iter(v);
loop {
    match iter.next() {
        Some(x) => {
            // body
        },
        None => break,
    }
}

You can see quite clearly that v is unusable not only after the loop ends, but before it even begins. This is because it has been moved into iter — into an iterator — through an into_iter method… of IntoIterator!

Simple, huh? :)

for loop is just a syntactic sugar for an IntoIterator::into_iter invocation, followed by repeated calling of Iterator::next.

The ampersand

On a more serious note, this move isn’t something that we’d always want to happen. Fortunately, we know a way to prevent it. Rather than iterating over the vector itself, use a reference to it:

for x in &v {
    // body
}

The great thing about this syntax is that everything said above still applies, up to and including the desugaring procedure. The into_iter method is still being invoked, except that this time it is done on the reference to the collection&Vec<T> rather than Vec<T>:

// (simplified)
impl IntoIterator for &Vec<T> {
    fn into_iter(self) -> Iterator<Item=&T> { ... }
}

The result is therefore an iterator that yields references to the elements (&T), rather than elements themselves (T). And because self above is also a reference, the collection isn’t really moved anywhere, which is why we can freely access it after the loop ends.

The exact same thing happens when looping over a mutable reference:

for x in &mut v {
    // body
}

except that this time into_iter is called for &mut Vec<T>. Result is therefore of type Iterator<Item=&mut T>, enabling us to modify the elements as we go through them.

No further compiler machinery is required to support those two cases, because everything is already handled by the same trait.

The IntoIterator desugaring works the same way for collections and both immutable and mutable references to them.

What about the iter() method?

So far, we’ve talked about regular for loops, and the very imperative style of computation they represent.

If you are more inclined towards functional programming, though, you may have seen and written rather different constructs, combining various “fluent” methods into expressions such as this one:

let doubled_odds: Vec<_> = numbers.iter()
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

Methods like map and filter here are called iterator adapters, and they are all defined on the Iterator trait. Not only are they very powerful and numerous, but they can also be supplemented through several third-party crates.

In order to take advantage of the adapters, however, we need to obtain an iterator for our collection first. We know that into_iter is the way loops normally do it, so in principle we could follow the same approach here:

let doubled_odds: Vec<_> = IntoIterator::into_iter(&numbers)
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

To spare us the verbosity of this explicit syntax, collections normally offer an iter method which is exactly equivalent4. This method is what you will normally see in chained expressions like the one above.

v.iter() is just a shorthand for IntoIterator::into_iter(&v).

Why not both?

The last thing to note is that Rust mandates neither loops nor iterator adapters when writing code that operates on collections of elements. When optimizations are turned on in the release mode, both versions should compile to equally efficient machine code, with closures inlined and loops unrolled where necessary.

Choosing one style over the other is therefore a matter of convention and style. Sometimes the right choice may actually be a mix of both approaches, and Rust allows it without any complaints:

fn print_prime_numbers_upto(n: i32) {
    println!("Prime numbers lower than {}:", n);
    for x in (2..n).filter(|&i| is_prime(i)) {
        println!("{}", x);
    }
}

Like before, this is possible through the same for loop desugaring that involves the IntoIterator trait. In this case, Rust will simply use a no-op implementation of this trait, “converting” any existing Iterator into” itself.

Iterators themselves are also “iterables”, implementing IntoIterator::into_iter as a pass-through.

Looping around

If you want to know even more about iterators and loops in Rust, the best source at this point is probably just the official documentation. And although mastering all the iterator adapters is of course not necessary to write effective Rust code, taking a careful look at least at the collect method (and the associated FromIterator trait) is definitely helpful.


  1. The “two-semicolon” variant of the for loop doesn’t exist in Rust. Just like in Python, the equivalent is iterating over a range object, or using a regular while loop for more complex cases. 

  2. This shift is completely transparent in the loop’s body. The way it works is based on Rust’s special mechanism called the Deref coercions. Without going into too much detail (as it is way out of scope for this post), this feature allows us to treat references to objects (&T) as if they were the objects themselves (T). The compiler will perform the necessary derefencing where possible, or signal an error in case of a (rare) ambiguity. 

  3. How do we know that? It’s because into_iter takes self (rather than &self or &mut self) as its first parameter. It means that the entire object for which this method is called is moved into its body (hence the method’s name). 

  4. Curiously enough, this equivalence isn’t encoded in the type system in any way, making it technically just a convention. It is followed consistently at least in the standard library, though. 

Continue reading