Tricks with ownership in Rust
Posted on Mon 07 March 2016 in Code
…or how I learned to stop worrying and love the borrow checker.
Having no equivalents in other languages, the borrow checker is arguably the most difficult thing to come to terms with when learning Rust. It’s easy to understand why it’s immensely useful, especially if you recall the various vulnerabities stemming from memory mismanagement. But that knowledge doesn’t exactly help when the compiler is whining about what seems like a perfectly correct code.
Let’s face it: it will take some time to become productive writing efficient and safe code. It’s not entirely unlike adjusting to a different paradigm such as functional programming when you’ve been writing mostly imperative code. Before that happens, though, you can use some tricks to make the transition a little easier.
Just clone
it
Ideally, we’d want our code to be both correct and fast. But if we cannot quite get to the “correctness” part yet — because our program doesn’t, you know, compile — then how about paying for it with a small (and refundable) performance hit?
This is where the clone
method comes in handy. Many problems with the borrow checker stem from trying to spread
object ownership too thin. It is a precious resource and it’s not very cheap to “produce”, which is why good Rust code
often deals with just immutable or mutable references.
But if that proves difficult, then “making more objects” is a great intermediate solution. Incidentally, this is what higher level languages are doing all the time, and often transparently. To ease the transition to Rust from those languages, we can start off by replicating their behavior.
As an example, consider a function that tries to convert some value to String
:
struct Error;
fn maybe_to_string<T>(v: T) -> Result<String, Error> {
// omitted
}
If we attempt to build upon it and create a Vec
tor version:
fn maybe_all_to_string<T>(v: Vec<T>) -> Result<Vec<String>, Error> {
let results: Vec<_> = v.iter().map(maybe_to_string).collect();
if let Some(res) = results.iter().find(|r| r.is_err()) {
return Err(Error);
}
Ok(results.iter().map(|r| r.ok().unwrap()).collect())
}
then we’ll be unpleasantly surprised by a borrow checker error:
error: cannot move out of borrowed content [E0507]
Ok(results.iter().map(|r| r.ok().unwrap()).collect())
^
Much head scratching will ensue, and we may eventually find an idiomatic and efficient solution.
However, a simple stepping stone in the shape of additional clone()
call can help move things forward just
a little quicker:
#[derive(Clone)]
struct Error;
// ...
Ok(results.iter().map(|r| r.clone().ok().unwrap()).collect())
The performance tradeoff is explicit, and easy to find later on with a simple grep clone\(\)
or similar.
When you learn to do things the Rusty way, it won’t be hard to go back to your “hack” and fix it properly.
Refcounting to the rescue
Adding clone()
willy-nilly to make the code compile is a valid workaround when we’re just learning. Sometimes, however,
even some gratuitous cloning doesn’t quite solve the problem, because the clone()
itself can become an issue.
For one, it requires our objects to implement the Clone
trait.
This was apparent even in our previous example, since we had to add a #[derive(Clone)]
attribute to the struct Error
in order to make it clone
-able.
Fortunately, in the vast majority of cases this will be all that’s necessary, as most built-in types in Rust implement
Clone
already. One notable exception are function traits (FnOnce
, Fn
, and FnMut
) which are used to store
and refer to closures1. Structures and other custom types that contain them (or those which may contain them)
cannot therefore implement Clone
through a simple #[derive]
annotation:
/// A value that's either there already
/// or can be obtained by calling a function.
#[derive(Clone)]
enum LazyValue<T: Clone> {
Immediate(T),
Deferred(Fn() -> T),
}
error: the trait `core::marker::Sized` is not implemented for the type `core::ops::Fn() -> T + 'static` [E0277]
#[derive(Clone)]
^~~~~
What can we do in this case, then? Well, there is yet another kind of performance concessions we can make, and this one will likely sound familiar if you’ve ever worked with a higher level language before. Instead of actually cloning an object, you can merely increment its reference counter. As the most rudimentary kind of garbage collection, this allows to safely share the object between multiple “owners”, where each can behave as if it had its own copy of it.
Rust’s pointer type that provides reference counting capabilities is called std::rc::Rc
. Conceptually, it is analogous
to std::shared_ptr
from C++, and it similarly keeps the refcount updated when the pointer is “acquired” (clone
-ed)
and “released” (drop
-ed). Because no data is moved around during either of those two operations, Rc
can refer even
to types whose size isn’t known at compilation time, like abstract closures:
use std::rc::Rc;
#[derive(Clone)]
enum LazyValue<T: Clone> {
Immediate(T),
Deferred(Rc<Fn() -> T>),
}
Wrapping them in Rc
therefore makes them “cloneable”. They aren’t actually cloned, of course, but because of
the inherent immutability of Rust types they will appear so to any outside observer2.
Move it!
Ultimately, most problems with the borrow checker boil down to unskillful mixing of the two ways you handle data in Rust. There is ownership, which is passed around by moving the values; and there is borrowing, which means operating on them through references.
When you try to switch from one to the other, some friction is bound to occur. Code that uses references, for example,
has to be copiously sprinkled with &
and &mut
, and may sometimes require explicit lifetime annotations. All these
have to be added or removed, and changes like that tend to propagate quite readily to the upper layers
of the program’s logic.
Therefore it is generally preferable, if at all possible, to deal with data directly and not through references. To maintain efficiency, however, we need to learn how to move the objects through the various stages of our algorithms. It turns out it’s surprisingly easy to inadvertently borrow something, hindering the possibility of producing a moved value.
Take our first example. The intuitively named Vec::iter
method produces an iterator that we can map
over, but does
it really go over the actual items in the vector? Nope! It gives us a reference to each one — a borrow, if you will
— which is exactly why we originally had to use clone
to get out of this bind.
Instead, why not just get the elements themselves, by moving them out of the vector? Vec::into_iter
allows to do exactly this:
Ok(results.into_iter().map(|r| r.ok().unwrap()).collect())
and enables us to remove the clone()
call. The family of similar into_X
(or even just into
) methods can be reliably counted
on at least in the standard library. They are also part of a more-or-less official
naming convention that you should also follow in your own code.
-
Note how this is different from function types, i.e.
fn(A, B, C, ...) -> Ret
. It is because plain functions do not carry their closure environments along with them. This makes them little more than just pointers to some code, and those can be freelyClone
-d (or evenCopy
-ed). ↩ -
If you want both shared ownership (“fake cloneability”) and the ability to mutate the shared value, take a look at the
RefCell
type and how it can be wrapped inRc
to achieve both. ↩