for loops in Rust
Posted on Tue 26 July 2016 in Code • Tagged with Rust, loops, iterators • Leave a comment
In this post, I’m going to talk about the for
loop construct in Rust,
as well as the related concepts of iterators and “iterables”.
Depending on your programming language background, they may seem somewhat familiar in terms of syntax & semantics, or rather mysterious and surprising. Their closest analogues exist in Python, but programmers of Java, C#, or (modern) C++ should recognize many relevant features and ideas as well.
Basics
The syntax of a for
loop is so modest it’s almost spartan1:
let v = vec!["1", "2", "3"];
for x in v {
println!("{}", x);
}
As you would expect, this prints three lines with 1
, 2
, 3
.
What is probably not as obvious is that
over the course of this loop the v
vector was expended.
Trying to use it after the iteration,
we’ll get a borrow checker error:
<anon>:6:22: 6:23 error: use of moved value: `v` [E0382]
<anon>:4 println!("{}", x);
<anon>:5 }
<anon>:6 println!("{:?}", v);
^
In Rust jargon, the vector has been moved into the loop. Its ownership — and that of its individual elements — has been transfered there permanently. While definitely surprising when compared to other languages, this behavior is consistent with Rust’s ubiquitous policy of moving values by default.
Still, you may not expect it here
because moving ownership is mostly seen at the function call boundaries.
For most intents and purposes, however,
you can picture a for_each
function like this to be the equivalent of the for
loop above:
for_each(v, |x| println!("{}", x));
This also gives us a hint on how we could prevent the move from happening. Rather than taking the vector itself, the function could accept only a reference to it:
for_each_ref(&v, |x| println!("{}", x));
After we translate this back to the looping syntax:
for x in &v {
println!("{}", x);
}
println!("{:?}", v);
we won’t get any more objections from the compiler.
Iterators and “iterables” in Rust
It is important to emphasize that this new ampersand symbol (&
)
is by no means a part of the syntax of the for
loop itself.
We have actually changed what object we’re iterating here.
It is no longer Vec<T>
— a vector itself — but &Vec<T>
, an immutable reference to it.
As a consequence, x
is not a T
(the element type) anymore,
but a &T
— a reference to an element2.
So it seems that in Rust, both Vec<T>
and &Vec<T>
are what we would call “iterables”:
collections (or other objects) that we can get iterate over.
The usual way this is implemented in various programming languages
is by introducing an iterator object.
The iterator keeps track of what element it’s currently pointing to and supports at least the following basic operations:
- getting the current element
- advancing to the next element
- signaling when no more elements are available
Some languages provide separate iterator methods for each of those tasks,
but Rust chooses to combine them all into one.
You can see that when looking at the Iterator
trait:
next
is the only method to be provided by its implementations.
Desugaring with into-iterators
How is the iterator object created, though?
In a typical Rust manner, this job is delegated to another trait.
This one is called IntoIterator
,
and it roughly corresponds to the “iterable” concept I’ve alluded to earlier:
// (simplified)
trait IntoIterator {
fn into_iter(self) -> Iterator;
}
What is uniquely Rusty is the fact that into_iter
— the sole method of this trait —
doesn’t just create a new iterator for the collection.
Instead, it effectively consumes the whole thing,
leaving the new iterator as its only remnant and the only way to access the items3.
This, of course, is a direct manifestation of the Rust’s move-by-default policy. In this case, it protects us from the common problem of iterator invalidation which is probably all-too-familiar to C++ programmers. Because the collection is essentially “converted” to an iterator here, it is impossible:
- for more than one iterator to exist at a time
- to modify the collection while any iterators are in scope
Doesn’t all this “moving” and “consuming” sound familiar, by the way?
I’ve mentioned earlier that when we iterate over a vector with a for
loop,
we essentially move it “into the loop”.
As you can probably deduce by now,
what really happens is that IntoIterator::into_iter
is invoked on the vector.
Its result — the iterator object — is then repeatedly next
‘ed until it returns None
.
In a way, a for
loop like this:
for x in v {
// body
}
is therefore nothing else but a syntactic sugar for the following expanded version:
let mut iter = IntoIterator::into_iter(v);
loop {
match iter.next() {
Some(x) => {
// body
},
None => break,
}
}
You can see quite clearly that v
is unusable not only after the loop ends,
but before it even begins.
This is because it has been moved into iter
— into an iterator —
through an into_iter
method… of IntoIterator
!
Simple, huh? :)
for
loop is just a syntactic sugar for anIntoIterator::into_iter
invocation, followed by repeated calling ofIterator::next
.
The ampersand
On a more serious note, this move isn’t something that we’d always want to happen. Fortunately, we know a way to prevent it. Rather than iterating over the vector itself, use a reference to it:
for x in &v {
// body
}
The great thing about this syntax is that everything said above still applies,
up to and including the desugaring procedure.
The into_iter
method is still being invoked,
except that this time it is done on the reference to the collection —
&Vec<T>
rather than Vec<T>
:
// (simplified)
impl IntoIterator for &Vec<T> {
fn into_iter(self) -> Iterator<Item=&T> { ... }
}
The result is therefore an iterator that yields references to the elements (&T
),
rather than elements themselves (T
).
And because self
above is also a reference,
the collection isn’t really moved anywhere,
which is why we can freely access it after the loop ends.
The exact same thing happens when looping over a mutable reference:
for x in &mut v {
// body
}
except that this time into_iter
is called for &mut Vec<T>
.
Result is therefore of type Iterator<Item=&mut T>
,
enabling us to modify the elements as we go through them.
No further compiler machinery is required to support those two cases, because everything is already handled by the same trait.
The
IntoIterator
desugaring works the same way for collections and both immutable and mutable references to them.
What about the iter() method?
So far, we’ve talked about regular for
loops,
and the very imperative style of computation they represent.
If you are more inclined towards functional programming, though, you may have seen and written rather different constructs, combining various “fluent” methods into expressions such as this one:
let doubled_odds: Vec<_> = numbers.iter()
.filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();
Methods like map
and filter
here
are called iterator adapters,
and they are all defined on the Iterator
trait.
Not only are they very powerful and numerous,
but they can also be supplemented through several third-party crates.
In order to take advantage of the adapters, however,
we need to obtain an iterator for our collection first.
We know that into_iter
is the way loops normally do it,
so in principle we could follow the same approach here:
let doubled_odds: Vec<_> = IntoIterator::into_iter(&numbers)
.filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();
To spare us the verbosity of this explicit syntax,
collections normally offer an iter
method which is exactly equivalent4.
This method is what you will normally see in chained expressions like the one above.
v.iter()
is just a shorthand forIntoIterator::into_iter(&v)
.
Why not both?
The last thing to note is that Rust mandates neither loops nor iterator adapters when writing code that operates on collections of elements. When optimizations are turned on in the release mode, both versions should compile to equally efficient machine code, with closures inlined and loops unrolled where necessary.
Choosing one style over the other is therefore a matter of convention and style. Sometimes the right choice may actually be a mix of both approaches, and Rust allows it without any complaints:
fn print_prime_numbers_upto(n: i32) {
println!("Prime numbers lower than {}:", n);
for x in (2..n).filter(|&i| is_prime(i)) {
println!("{}", x);
}
}
Like before, this is possible through the same for
loop desugaring
that involves the IntoIterator
trait.
In this case, Rust will simply use a
no-op implementation
of this trait, “converting” any existing Iterator
“into” itself.
Iterators themselves are also “iterables”, implementing
IntoIterator::into_iter
as a pass-through.
Looping around
If you want to know even more about iterators and loops in Rust,
the best source at this point is probably just
the official documentation.
And although mastering all the iterator adapters is of course not necessary to write effective Rust code,
taking a careful look at least at
the collect
method
(and the associated FromIterator
trait)
is definitely helpful.
-
The “two-semicolon” variant of the
for
loop doesn’t exist in Rust. Just like in Python, the equivalent is iterating over a range object, or using a regularwhile
loop for more complex cases. ↩ -
This shift is completely transparent in the loop’s body. The way it works is based on Rust’s special mechanism called the
Deref
coercions. Without going into too much detail (as it is way out of scope for this post), this feature allows us to treat references to objects (&T
) as if they were the objects themselves (T
). The compiler will perform the necessary derefencing where possible, or signal an error in case of a (rare) ambiguity. ↩ -
How do we know that? It’s because
into_iter
takesself
(rather than&self
or&mut self
) as its first parameter. It means that the entire object for which this method is called is moved into its body (hence the method’s name). ↩ -
Curiously enough, this equivalence isn’t encoded in the type system in any way, making it technically just a convention. It is followed consistently at least in the standard library, though. ↩