Bugs You'll Probably Only Have in Rust
June 14, 2017
The source of this post is hosted on Github. If you find any errors, please file an issue or submit a PR! đ
Recently, Ralf Jung found a bug in Rustâs standard library. Congrats to him! đ
The bug was a missing annotation, and the result was that users of Rustâs stdlib could compile some incorrect programs that violated memory safety. đ Fortunately, those incorrect programs would be pretty surprising to see in the wild, so no one should be affected. Adding the annotation should fix the bug without anyone noticing.
On twitter an interesting question was raised: is this the first Rust-specific bug class?
Nope! Thereâs actually a few of them, which is why I did my best to document them in The Rustonomicon (which Iâve started working on again, and hope to improve and finish). But the Rust-specific bugs documented in the nomicon are mixed in with lots of details and bugs that happen in other languages, so I thought it would be interesting to put all the Rust-specific ones together in one place.
§What These Bugs Arenât
So just to be clear, thereâs lots of bugs that Rust code can have. Many of them you would expect to see in any language: integer overflow, index out of bounds, failing to sanitize input, etc.
Others are kinda Rust-specific, in that theyâre a consequence of certain features: exception safety, dangling raw pointers, inappropriately trusting interfaces, etc. However we wonât be focusing on these issues because theyâre just as relevant in Rustâs close relatives: C++ and Swift. (Swift doesnât really have exception safety issues; and C++ doesnât really have trust issues, insofar as C++ doesnât really try to distinguish between âsafeâ and âunsafeâ operations)
But the bugs I want to focus on Iâve only seen in Rust code, as a result of unique type-system interactions. I wonât go so far as to say that no other language has these issues. I just donât know them, and I wouldnât expect to see them.
Also I should note that all the bugs I want to focus on will be related to writing unsafe code. So these arenât things you generally need to worry about day-to-day when using Rust. The reason theyâre related to unsafe code is because these bugs are generally places where weâre doing clever/subtle things for you which are always correct in safe code but fall over once you start doing the things unsafe
lets you.
§Ralfâs Bug: Oops Unsafe Autotraits
TL;DR: if you write unsafe
you must closely audit Send
and Sync
for your types!
This oneâs going to need a fair bit of background, because itâs the consequence of a several obscure Rust features.
Thereâs a trait (interface) in Rust called Sync. Itâs a marker trait, which just means that the interface doesnât actually have any methods or types you need to provide to implement. Itâs basically a type-level boolean: either you implement it or you donât. Implementing Sync is a declaration that your type has a property: itâs safe for two threads to access it concurrently. Generic APIs which want to share data between threads do that by requiring T: Sync
so that theyâre safe.
Almost every type in Rust can implement Sync, because sharing in Rust means read-only access. Thereâs no worries about two threads reading from an integer at the same time, so integers are Sync. Similarly a type made of integers is safe, and so on. The interesting Sync types come from concurrency primitives which actually let you mutate in shared contexts: Mutex, RwLock, AtomicUsize, etc. Very few types arenât Sync. The most notable one is probably Rc, which is a non-atomically reference counted pointer, because it lets you increment the count in shared contexts without proper synchronization.
Thereâs two quirks about Sync that make it different from almost any other trait in Rust: itâs unsafe, and itâs an autotrait.
Being unsafe just means that when you implement Sync, you have to use the unsafe
keyword. The benefit of this is purely a matter of rules lawyering: it must be impossible to violate memory safety with Rust and its standard library â unless you wrote unsafe
. We make Sync unsafe because, unlike most other traits, there is no âdefenseâ that can be taken against a bad Sync implementation. If we get it wrong then weâre letting our users write Data Races, which violate memory safety!
But stopping there would have been an unacceptable state of affairs. Making unsafe
a big scary âall bets are offâ button is only compelling if most of our users donât need to use that button. Rust is trying to be a language for writing concurrent applications, so sharing your type between threads requiring unsafe
would be really bad.
This is solved by Sync being an autotrait. An autotrait, which is a feature that is currently unstable and basically only used by the standard library, is a trait thatâs structurally derived for all types in existence. So if your type contains only Sync types, youâre Sync without you saying a word about it. So everyone gets to share their types across threads without writing unsafe
!
For almost every type in existence, this works perfectly: if youâre made up of things that can be safely shared between threads, then you can be safely shared between threads. If you contain anything that canât be, then you canât be.
For the few types where this system gets the wrong answer, Autotraits can be manually implemented or unimplemented. It is in these manual implementations that you must write unsafe
, and accept the burden of memory safety. As a safety mechanism, several âfishyâ types explicitly donât implement Sync, so that any type containing them has to explicitly declare that itâs ok. The most notable example of this *mut
, the raw pointer type.
And here we get to Ralfâs Bug: a type (MutexGuard) autoderived Sync, when it shouldnât have. Thatâs it. We should have been a bit more vigilant and noticed this, but we didnât. Dang. đ
But wait, how could this have happened? I just argued that it should be impossible to violate memory safety without writing unsafe
, but our bug just violated memory safety, and it was because we didnât write anything!
The answer is that, in fact, we did write unsafe
â in the implementation of MutexGuard. (score one for rules lawyers!) The only reason MutexGuard being Sync is problematic is because it has imbued itself with special semantics using unsafe code â the ability to reach into a Mutex and access its data without any synchronization.
Usually this isnât a problem because our âfishy typeâ protection catches it. But in this case MutexGuard just holds a perfectly normal reference to a Mutex â and Mutexes are Sync, so that should be fine!
So thereâs our first bug that You Should Only See In Rust: writing unsafe code, and failing to audit the stdlibâs two unsafe autotraits for your types: Send and Sync.
And hereâs the fix: properly overriding the Sync impl for MutexGuard⌠right next to the line where we had already done that for Send two years ago. đ
§Unbound Lifetimes
TL;DR: donât infer lifetimes that unsafe pointers produce!
So Rust has these lifetime things. Theyâre type-level variables for talking about scopes/regions in your program. We attach them to references to indicate what scope of the program we can guarantee the reference will be valid for. Mostly, lifetimes only show up to create a relationship between two references. For example:
fn foo<'a>(input: &'a u32) -> &'a u32;
Here weâre saying âI take in any reference to u32, and returns a reference to a u32 that lasts at least as longâ. Because this is an incredibly common idiom, we have a rule that lets you leave off the lifetimes here. So this says the same thing:
fn foo(input: &u32) -> &u32;
When a lifetime in the output also appears in the input (as is the case here), we say that itâs bound by the input. This gives the compiler something to reason about. Specifically it determines that the output must, somehow, be derived from the input. So the caller of our function can make sure the input stays valid as long as we keep around the output:
fn main() {
let x;
{
let y = 0;
x = foo(&y);
}
println!("{}", x);
}
error: `y` does not live long enough
--> <anon>:10:5
|
9 | x = foo(&y);
| - borrow occurs here
10 | }
| ^ `y` dropped here while still borrowed
11 | println!("{}", x);
12 | }
| - borrowed value needs to live until here
And the body of our function can make sure the output is derived from the input (or something that lives longer):
fn foo1(input: &u32) -> &u32 {
return input; // ok, output derived from input
}
static global_int: u32 = 0;
fn foo2(input: &u32) -> &u32 {
return &global_int; // ok, globals live longer than everything
}
fn foo3(input: &u32) -> &u32 {
let temp = 0; // ERROR: doesn't live long enough
return &temp;
}
error: `temp` does not live long enough
--> <anon>:12:13
|
12 | return &temp;
| ^^^^ does not live long enough
13 | }
| - borrowed value only lives until here
|
Ok, so what does the compiler do if you give it a lifetime that isnât bound â an unbound lifetime?
fn foo1<'a>() -> &'a u32 {
let temp = 0;
return &temp; // ERROR: doesn't live long enough
}
static global_int: u32 = 0;
fn foo2<'a>() -> &'a u32 {
return &global_int; // ok, globals live longer than everything
}
error: `temp` does not live long enough
--> <anon>:3:13
|
3 | return &temp;
| ^^^^ does not live long enough
4 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the body at 1:25...
--> <anon>:1:26
|
1 | fn foo1<'a>() -> &'a u32 {
| __________________________^
2 | | let temp = 0;
3 | | return &temp;
4 | | }
| |_^
It prevents us from using anything other than a reference to a global. Ok, that seems reasonable.
But what happens when we throw some unsafe code at the issue?
fn foo<'a>(input: *const u32) -> &'a u32 {
unsafe {
return &*input
}
}
fn main() {
let x;
{
let y = 7;
x = foo(&y);
}
println!("hello: {}", x);
}
hello: 2384275376
UHHHH đ
It turns out that unbound lifetimes are really bad when unsafe code gets involved.
Basically, if you dereference a raw pointer, this produces an unbound lifetime. This is because we donât know what scope could possibly be right. Unbound lifetimes are basically the âyes menâ of the type system â they change shape to be whatever scope we ask them to fit. This is genuinely useful in the case of raw pointers, because there isnât a better answer. Normally itâs also fine because we either discard the reference quickly, or we pass it somewhere that causes the lifetime to become bound and act reasonably (a struct field, or the return value of a function).
But if you mess up your signatures and leave an unbound lifetime in there⌠look out!
Note that our code only compiled because we used an explicit lifetime variable; this doesnât compile:
fn foo(input: *const u32) -> &u32 {
unsafe {
return &*input
}
}
error[E0106]: missing lifetime specifier
--> <anon>:1:30
|
1 | fn foo(input: *const u32) -> &u32 {
| ^ expected lifetime parameter
|
= help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
= help: consider giving it an explicit bounded or 'static lifetime
So one way to keep yourself safe is to use lifetime elision as much as possible.
Hereâs a (pre-1.0) bug where we messed this up in the Rust standard library.
§Destructor Leaking - The Leakpocalypse
TL;DR: donât rely on borrows expiring meaning destructors ran!
Every language I know of that has destructors comes with a little asterisk: destructors arenât guaranteed to actually run. Or to put it another way:
THIS SOFTWARE IS PROVIDED ââAS ISââ AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
In Rust, Swift, and C++, the standard example of this is stuffing a type-with-a-destructor inside a reference-counted cycle. Even when no more references to the data remain, the cycle will keep itself alive and âleakâ the destructor forever.
In Swift and C++ this is kinda annoying, but not a big deal. In Rust, itâs a bigger deal.
This is because in Rust, unlike the other two languages, âsomething is no longer aliveâ is an event that lets you do new things with your still-alive values â it can free up borrows!
let x = 0;
{
let y = &mut x;
println!("{}", x); // ERROR: x is borrowed
}
println!("{}", x); // OK
error[E0502]: cannot borrow `x` as immutable because it is also borrowed as mutable
--> <anon>:5:24
|
4 | let y = &mut x;
| - mutable borrow occurs here
5 | println!("{}", x); // ERROR: x is borrowed
| ^ immutable borrow occurs here
6 | }
| - mutable borrow ends here
This is an important tool for API design in Rust. Itâs what lets us safely make iterators for collections without having to copy them or add invalidation checks. It lets us safely keep pointers to reference-counted data without actually touching the reference count. Being able to rely on âa borrow has expiredâ meaning the thing that held the borrow is gone forever is good and safe.
What you canât do is assume that the borrow expiring that was held by a type means that type had its destructor run. For instance, if you do the reference-counted cycle leak trick, Rust will correctly determine that the borrow can safely expire â after all no one can ever access the reference that was leaked.
The Rust stdlib devs fell awry of this, and had developed two APIs that were relying on this fact. When this was pointed out there was a bit of a panic in the community, and thus the The Leakpocalypse occured.
In the end we declared that this sort of thing was just a fact of life. However while handling the fallout of this decision, we figured out a way to make this pattern work, sort of.
For example, drain
was an experimental API that relied on destructors running. It was an iterator that moved the values out of a collection, without actually consuming the collectionâs allocation â as if you called pop
until there was nothing left. Of course, we wanted to do it without having to create any temporary allocations or keep the collection in a valid state.
We reasoned this was safe because the Drain iterator would hold a mutable borrow on the collection, statically preventing anyone from accessing it while removing elements. The destructor of the drain would empty out any elements that hadnât been read yet, and then update the collectionâs metadata to reflect its emptiness. Perfect!
Except, someone could leak the Drainâs destructor, and then the collection would be accessible in a corrupt state.
The solution to this was a trick we called leak amplification, because it basically raised the stakes of a destructor leak: if you leak me, Iâll leak you right back.
The trick was to change the code that constructed a Drain to tell the collection that it was now empty. So, if at any point the Drain was leaked, the collection would find itself in a perfectly valid state. However this would also result in a leak of all the destructors of all the unprocessed elements being leaked. You leak me, Iâll leak you right back.
However if the Drain destructor was allowed to run, which is almost always the case, then it would do all the cleanup and no one would know the difference. Nice!
Sadly, some APIs donât have any way to apply leak amplification, as was the case for the original design of thread::scoped. For that API, we had to force the user of our API to run inside a closure we call, so that we could run any cleanup code we wanted after their closure terminated. This works, itâs just less ergonomic and composable.
§Honorable Mention: Zero Sized Types (ZSTs)
TL;DR: be sure that size_of::<T>() == 0
is handled reasonably for allocations and offsets!
Ok so this one actually shows up in a few other languages (at very least Swift and Go), but itâs really cool and I want to talk about it. To be fair, I think these problems are the worst in Rust.
Rust lets you create types that have nothing in them, and thus have zero size. For example, the empty tuple ()
. Instances of these types are completely useless â after all they hold no state. However they serve two purposes:
- As static encodings of properties/facts
- As ways to âthrow awayâ parts of a generic design
As an example of (1), the author of the Diesel ORM often brags that building up SQL queries in Diesel actually just builds up a really complicated zero-sized type, which the compiler then statically resolves for minimum runtime overhead.
As an example of (2), HashSet<T>
is implemented as a trivial wrapper around HashMap<T, ()>
. Because the language knows instances of ()
are useless, all the code in HashMap that manipulates values is completely eliminated, producing an optimized HashSet implementation. As a more extreme example, Vec<()>
is basically just a glorified counter.
However, in low level (unsafe) code zero-sized types can be a bit hazardous, for two reasons:
-
If you try to malloc some number of zero-sized types naively, youâll pass an allocation size of 0 to the allocator, which is implementation-defined at best. Also you want to avoid allocating these things anyway!
-
If you try to offset a pointer to a zero-sized type naively, you will get nowhere. This breaks a C-style âtwo pointers moving towards each otherâ iterator.
I discuss how to handle these cases in the nomicion.
To the best of my memory (which is bad), the stdlib has never shipped with ZST bugs in it, but Iâve certainly caught these bugs when reviewing patches. Subtle details about ZSTs and allocations were common enough issues in contributions that I ended up building the RawVec abstraction to handle all those details for everyone in the stdlib.
Since I mentioned Go and Swift, Iâll briefly note that they do slightly different things here.
Swift decouples size from stride. ZSTs have a stride of 1 (byte), and allocations and offsets key off stride. So this means an Array of 100 empty tuples will allocate a buffer of 100 bytes that are never read/written. This is effectively the same solution that C++ has for empty types, but the decoupling of size and stride lets Swift stuff a bunch of zero-sized types in a struct and still declare that the struct is zero-sized with a stride of 1. I believe C++ would be forced to give every ZST in the struct its own byte.
It seems like Go is closer to Rust in legitimately optimizing away ZSTs, but Iâm not super confident in the details. Because Go is garbage collected, it seems like it needs to handle a lot of the details here for you, so ZSTs arenât as big of a concern in practice.
This blog post seems to have some interesting details about ZSTs in Go, but I donât know how accurate it is.
§Honorable Mention: Variance
TL;DR: be sure to eat all of your PhantomDatas for breakfast!
Variance is the thing that makes it so that you can pass a Cat to a function that expects an Animal in an object-oriented language, but not an ArrayList<Cat>
where an ArrayList<Animal>
is expected. In Rust, variance is applied to lifetimes: you can pass a long-lived thing to a function that expects short-lived things.
Rust is already relatively unique in that it actually exposes type variance as something user-defined types can have (&MyType<&'long T>
can potentially be passed where &MyType<&'short T>
is expected). This is because variance is usually made unsound by shared mutability: if you turn an ArrayList<Cat>
into an ArrayList<Animal>
, and someone puts a Dog
in, then anyone still holding the ArrayList<Cat>
will interpret that Dog as a Cat! As it turns out, Rustâs type system is incredibly good at managing aliasing and mutability, so itâs easier for us to use variance in more places.
However Rust mostly handles variance in a similar manner to Sync: itâs automatic and based on the values stored in the type. This leads to a similar scenario to Sync: once you start using unsafe code, it can get confused and give you variance when you shouldnât have it.
The solution to this is to, whenever your type stores values that Rust canât âseeâ, use PhantomData. PhantomData<T>
is a cool little language feature that lets you tell the compiler âlook Iâm not gonna directly store this type, but just pretend I didâ. Itâs zero-sized, so it has no impact on your type at runtime.
For instance:
struct MySmahtPointer<T> {
allocation: *mut u8, // Something complex we can't do in the type system
_boo: PhantomData<*mut T>, // But hey there's some mutably shared T's in there!
}
Pre-1.0, Rust actually used to have markers that explicitly requested a âkindâ of variance, but it was deemed too confusing in practice. The PhantomData system was adopted because it let you just âsay what youâre doingâ and have the compiler figure it out for you.
Note that in the above example, removing the PhantomData would be a compilation error, as we refuse to let you claim youâre generic over a type without actually telling us how youâre using it. However there are more subtle definitions which may lead to incorrect results, like this:
struct MySmahtPointer<T> {
allocation: *mut u8, // Something complex we can't do in the type system
aux_value: T, // Compiler thinks this is the only way we're using it,
// and gives us variance incorrectly.
}
I discuss this situation in more detail here and here.
§Thatâs All I Can Think Of!
Lemme know if you can think of any others!