Crust of Rust: Smart Pointers and Interior Mutability

Crust of Rust: Smart Pointers and Interior Mutability

Introduction

In this section, the speaker introduces the topic of smart pointers and interior mutability in Rust. They discuss some of the types that are commonly used in Rust, such as arc, RC, refs, l mutex L, DRF, as ref traits, Baro trait, cow and sized.

  • The speaker also provides information on where to find recordings of their streams and how to get announcements for upcoming streams.
  • They introduce a new sub-channel on the Rust Asian Station Discord server called Rotation Station which is intended to be a community podcast for Rust.

Types of Smart Pointers

In this section, the speaker discusses some of the common types of smart pointers in Rust.

Common Types

  • The speaker mentions some common types such as arc RC refs l mutex L DRF and as ref traits.
  • They explain that they will try to implement some of these types themselves during the stream.

Implementing Smart Pointers

In this section, the speaker explains that they will be implementing various smart pointer types during the stream.

Implementation Plan

  • The speaker plans to implement RC ref sell and cell.
  • They also plan to discuss Arc Mutex Ref Borrow Cow Bull and Sized if time permits.
  • The implementation will help viewers understand intermediate concepts in Rust.

Resources for Learning Rust

In this section, the speaker provides resources for learning more about Rust.

Resources Available

  • The speaker posts all recordings on YouTube after each stream.
  • There are older videos online that viewers can watch.
  • Follow them on Twitter for input on episodes or new episodes or announcements of upcoming streams.
  • A sub-channel has been added to the Rest' Asian Station Discord server called Rotation Station which is intended to be a community podcast for Rust.

Introduction to Cell

In this section, the speaker introduces the concept of interior mutability and how it relates to the cell module in Rust.

Interior Mutability

  • The speaker explains that interior mutability is a type that externally looks immutable but has methods that allow you to mutate it.
  • They explain that the cell module provides various container types that allow you to do this in a controlled fashion under certain constraints.

Cell Module

  • The cell module is used for shareable mutable containers.
  • It allows you to have a shared reference to something while someone else also has a reference, yet you're still allowed to mutate it.

Interior Mutability and Cell

In this section, the speaker discusses interior mutability and exclusive or shared references. They then introduce the concept of a cell in Rust.

Introduction to Cell

  • A cell provides interior mutability in Rust.
  • You can create a new cell with a value of some type T.
  • The set method allows you to change the value inside the cell using an immutable reference.
  • The swap method lets you swap values between two cells.
  • The replace method replaces the value inside the cell with a new one.
  • The into_inner method consumes self, assuming ownership of the cell.

Restrictions of Cell Types

  • Different types of cells have different restrictions on what can be stored inside them and how they can be used.
  • As you move towards mutex, there is more freedom to store whatever you want but also more overhead involved in making it work.
  • Box does not provide interior mutability. If you have a shared reference to a box, you cannot mutate its contents.
  • There is no way to tell externally from a type whether it has interior mutability.

Safety Features of Cell

  • With Cell, there is no way to get a reference to what's inside the cell itself. This means that if no one else has a pointer to the value stored in the cell, changing it is safe.
  • Cell does not implement sync. This means that if you have a reference to a cell, you cannot give away that reference to another thread.

Why can't we borrow Mewtwo more than once for RC?

In this section, the speaker explains why it is not possible to borrow Mewtwo more than once for RC.

Cell and Exclusive Reference

  • You cannot use an exclusive reference in general with a cell.
  • If you have an exclusive reference to the cell, you can get an exclusive reference to the value inside. But at that point, you cannot get or change the value.
  • The benefit of using a cell is that you can have multiple shared references to a thing. For example, if you want the cell to be stored in multiple places or pointers to it'd be stored in multiple places like in some data structure like imagine a graph where some of the things might share a value right then you might have multiple references to a thing but because it's single-threaded, you know that you will only be using one of the references at a time.
  • Cell should usually just be used for small copy types.

What are cells used for?

In this section, the speaker explains what cells are used for.

Using Cells

  • You generally want to use cell with types that are copy and relatively cheap to copy out because that's the only way you can get their values.
  • It is usually used for smaller values like numbers or flags that need to be mutated from multiple different places.
  • It is often used with thread locals where there's only one thread accessing it and you might want to keep some thread-local state like a flag or counter or something but the thread local only gives you shared reference to that thing because one thread might try

to get the thread local multiple times and then sell is a good way to provide mutability to it.

Implementing Cell

In this section, the speaker explains how to implement a cell.

Creating a Cell

  • To create a cell, you need to use unsafe cell because it is the only way that we can actually from a shared reference that we can mutate something through that shared reference.
  • The value here is going to have to be an unsafe cell. That's the only way that we can actually from a shared reference that we can mutate something through that shared reference.
  • You cannot do this just with the type system if you don't rely on sink.

Basic API

  • The basic API for creating a cell includes:
  • A new function which takes a value of type T and returns self. This gives us a cell that contains the value that was given.
  • A set function which takes an immutable reference to self and a value T. It does self.value equals value but currently will not work right because we're trying to assign to self.value which is behind a shared reference and so we can't modify it.
  • A get function which returns T. This is like the basic API were going for right but remember part of the part of well.

Conclusion

In this section, the speaker concludes by summarizing what they've discussed in previous sections.

Summary

  • Cells are used for smaller values like numbers or flags that need to be mutated from multiple different places.
  • Unsafe cells are used when creating cells because they allow us to mutate something through a shared reference.
  • The basic API for creating cells includes new, set, and get functions.

Introduction to Unsafe Cell

In this section, the speaker introduces the concept of unsafe cell and explains how it can be used to dereference a raw pointer.

Using Unsafe Cell

  • The value of unsafe cell is just unsafe sell new values.
  • Trying to dereference a raw pointer is currently incorrect because the compiler doesn't know if anyone else is mutating that value.
  • Writing "unsafe" here tells the compiler that we have checked that no one else is currently mutating this value. However, this code will be rejected if two threads try to write to a value at the same time.

Implementing Not Sync for Cell T

  • We need to implement not sync for cell T and tell the compiler that you can never share a cell across threads.
  • Unsafe cell itself is not sync, which means that cell is not sync. Therefore, this unsafe code will be rejected.

Recap of Unsafe Cell

In this section, the speaker summarizes what was covered in the previous section about unsafe cells.

Summary of Unsafe Cell

  • The cell type allows you to modify a value through a shared reference because no other threads have a reference to it.
  • Get returns a copy of the value stored inside so even if we change the value, we don't have to invalidate any references because there are no references outside.

Cell and Sync

In this section, the speaker discusses how to use cell and sync in Rust.

Using Cell to Get a Reference

  • Use Cell to get a reference out.
  • Get a reference to the first thing inside the vector.
  • Set back blank or whatever.
  • If you try to say print first even if it is single-threaded, it is not okay because once you call set that vector is gone as a first should be invalidated.

Disallowing Unsafe Implementation of Sync

  • To disallow unsafe implementation of sync, never give out a reference.
  • Get only return copy and we never give out a reference which means a set is always safe.

Demonstrating Broken Code

  • There isn't really a good way to demonstrate broken code even though it is because the two threads are both going to modify the value in place and the problem is you don't know what value it's going to be set to.
  • One way could be having one thread that tries to set the whole array value as one and another thread sets it as two.

The Problem with Interleaving Threads

In this section, the speaker discusses the problem of interleaving threads and how it can lead to corrupted arrays.

Interleaving Threads

  • When two threads are modifying the same bit of memory at the same time, there is no guarantee that they won't step on each other.
  • If one thread writes its value and then goes to sleep while another thread runs for a while and then goes to sleep, we'll see an interleaving of values.
  • This can result in a corrupted array that contains some values that were never set by either thread.
  • The underlying memory system may be fast enough that these interleavings don't show up in practice, but it's still a potential problem.

Demonstrating the Problem

  • By running a test where two threads increment a shared variable 100,000 times each, we can demonstrate how interleaving threads can cause lost modifications.
  • Because both threads are modifying the same value at the same time, some of their modifications end up being lost.
  • This results in an incorrect final value for the shared variable.

Non-Copy Types and Trait Bounds

In this section, the speaker discusses why non-copy types are allowed even though they require more constraints than copy types.

Copy vs. Non-Copy Types

  • Copy types are those that implement Rust's Copy trait and can be safely copied without any special considerations.
  • Non-copy types do not implement Copy and require more constraints when used in multi-threaded code.
  • Generally, only methods that actually need non-copy types should have trait bounds requiring them.

Trait Bounds for Safety

  • When using non-copy types like Cell, it's important to document why they're safe to use in multi-threaded code.
  • For example, we know that no other thread is concurrently mutating a Cell because it's not sync.
  • Similarly, when using a non-copy type in a method like get, we know that the value is not being modified by any other thread because only one thread can execute the method at a time.

Unsafe Cell and Ref Cell

In this section, the speaker explains what Unsafe Cell and Ref Cell are in Rust programming language.

Unsafe Cell

  • Unsafe cell is a special type in Rust that allows you to mutate data even if you have a shared reference.
  • The only way to correctly go from a shared reference to an exclusive reference is through unsafe cell.
  • You cannot cast a shared reference into an exclusive reference without going through unsafe cell because the compiler might optimize your code in such a way that it breaks.

Ref Cell

  • Ref cell is used for safe dynamic borrowing of values like graphs and trees where there might be cycles.
  • It lets you check at runtime whether anyone else is mutating the value.
  • The borrow checking is done at compile time but ref cell lets you do it at runtime.
  • There can only ever be one exclusive reference but there can be any number of shared references.

RefCell and Cell

In this section, the speaker explains how to use RefCell and Cell in Rust.

Borrowing with RefCell and Cell

  • The borrow and borrow_mut methods are not suitable for this case.
  • If the state is unshared, we can give out a value. Otherwise, we cannot.
  • If it's exclusively borrowed out, it's not okay to give out a shared reference. Similarly, if any reference has been given out, it's not okay to give out an exclusive reference.
  • We need to set self.state to be exclusive if we give out an exclusive reference. Similarly, if it was unshared but we gave out a shared reference to it that we need to set that it is now shared.

Using Cell instead of RefCell

  • Modifying ref state here in a way that's not thread-safe because multiple threads might both read the old end and both set the new end to be n+1 but you would end up losing one of the increments.
  • We can make this a cell because cell gives us exactly what we need - the ability to mutate something through a shared reference.
  • No exclusive references have been given out since state would be exclusive when using something like rayon would ref sell sell make no sense.

Safety Argument

  • This is safe because no exclusive references have been given out since state would be exclusive or shared when using something like rayon.

Ref Type and Ref Mute Type

In this section, the speaker discusses the need for a ref type and a ref mute type to track shared references. They explain that these types will contain a reference to the ref cell.

Ref Type and Ref Mute Type

  • A ref is defined as having a lifetime that points to the ref cell.
  • The ref and ref mute types only need to hold a reference to the ref cell.
  • The drop function is implemented for refs, which decrements the reference count when dropped.
  • If it's marked as being shared with one reference, then now it is unshared when this thing goes away.
  • Every time someone borrows a shared version of our inner value then we increment the count and we return one of these refs. When that ref is eventually dropped then we decrement the count and set it either two and minus one or two unshared if there are now no shared references.

Deref Trait

  • To get to T from the given reference, they implement Deref trait which allows them to automatically follow deeper into a type.
  • Given a reference to self, give me a reference to this target type in this case T. This allows calling any method that requires a reference of T on it.

RefCell and Rc

In this section, the speaker explains RefCell and Rc in Rust programming language.

RefCell

  • RefCell allows for mutable data to be shared between multiple owners.
  • RefMut is only created if no other references have been given out once it is given out state is set to exclusive so no future references are given out so we have an exclusive lease on the inner value so dereferencing is fine mutable or immutable EDD referencing is fine.
  • Safety here is see safety 4d refute.
  • It's common practice to write safety comments for every unsafe use.

RC

  • RC provides shared ownership of a value of type T allocated in the heap.
  • Shared references in rust disallow mutation by default and RC is no exception.
  • RC never provides mutability all it does is allow you to have multiple shared references to a thing and only deallocate it when the last one goes away.
  • RC is not thread safe.

RC and Smart Pointers

In this section, the speaker explains the difference between weak and strong pointers. They then introduce RC (Reference Counted) smart pointers and explain how they work.

Weak vs Strong Pointers

  • A weak pointer will not prevent an object from being deleted, whereas a strong pointer will.
  • Weak smart pointers need to be upgraded to a real pointer before use, but this upgrade can fail.

Introduction to RC Smart Pointers

  • An RC is a pointer to some type T that is stored on the heap.
  • The value needs to be stored on the heap because if multiple functions in the code reference it, it cannot be on the stack of any given function.
  • The reference count has to be in the value that is shared amongst all copies of the RC.
  • An RC inner holds both the value and reference count.

Implementing Clone for T

  • Clone for T is implemented by incrementing the ref count and returning another RC.
  • DRF (Dereference) is implemented similarly to RCT.

Unsafe Blocks

  • DRF uses unsafe blocks because otherwise, if we dereferenced a box after its scope ended, memory would get freed prematurely.
  • The compiler doesn't know whether a pointer inside an RC is still valid or not.
  • Therefore, DRF needs an unsafe block where we assert that inner is only deallocated when the last RC goes away.

Rust Reference Types

In this section, the speaker discusses reference types in Rust and the semantics that must be followed when using them.

Reference Types

  • Ref Mew T, Star Mew T, Star Consti So Star Mute and Star Const are not references but raw pointers.
  • Ampersand symbol means a shared reference. Ampersand mute means an exclusive reference.
  • The star versions of these like star constants star mute do not have guarantees. If you have a raw pointer, the only thing you can really do to it is use an unsafe block to dereference it and turn it into a reference.
  • The difference between star constant and star mute is fuzzy. A star mute is usually something that you might be able to mutate something you might have an exclusive exclusive reference to whereas the star constant is intended to signify that you will never mutate this.

Box in Rust

In this section, the speaker discusses what box provides for us in Rust.

Box Provides Heap Allocation

  • Box provides heap allocation which lets us go from our Zener which would otherwise be on the stack to a pointer that is on the heap which is what we store here.
  • For clone here we're going to increase the reference count but here we have the same problem as we did for ref cell right which is we have a shared reference to self but we need to mutate something inside of it and so here lo and behold the problem is the answer is the same thing that we've done before it's our friend cell.

Unsafe Keyword in Rust

In this section, the speaker discusses the unsafe keyword in Rust and its meaning.

Unsafe Keyword

  • The unsafe keyword is a little weird because really what it means is I have checked that the stuff inside the brackets is safe. It's like I as the programmer certified that this is safe so it's not really unsafe.
  • It's like in some sense saying that I acknowledge that this code seems unsafe but it's actually safe so I agree with you it's a little bit of a weird keyword name.

Smart Pointers in Rust

In this section, the speaker discusses smart pointers in Rust and how to deallocate them.

Deallocating Smart Pointers

  • When an RC goes away we need to make sure that when the last RC goes away then we actually deallocate otherwise there will be a memory leak.
  • We are going to check with the countess if the count is one we are being dropped for after us there will be no RCS and no references to tea otherwise there are other references there are other RCS so don't drop the box.

Understanding Variance in Rust

In this section, the speaker explains variance in Rust and how it relates to star-mutant-star-const. They also introduce non-null as a way to optimize code.

Variance in Rust

  • Variance is one of the primary differences between star mutant star const.
  • Non-null is used for optimization purposes because it allows the compiler to know that a pointer can't be null.
  • Option non-null uses the null pointer to represent none, resulting in no overhead.

Using Non-Null

  • The standard library has a neat thing called non-null that we can use instead of using star mute.
  • We give it a storm utage from box from raw and use it to get back this star mutti which is what we need for into raw.

Unsafe Pointers and References

In this section, the speaker discusses unsafe pointers and references. They explain why dropping an inner before doing something else is necessary and why mutable pointers are different from mutable references.

Dropping Inner Before Doing Something Else

  • Dropping inner before doing something else is necessary because of how Rust deals with lifetimes.
  • The mutable reference that we return lives only as long as the mutable reference to self.

Mutable Pointers vs Mutable References

  • A mutable pointer does not carry the additional implication that it's exclusive, which is what allows you to mutate through things.
  • A mutable reference guarantees that no one else is currently modifying it and it is an exclusive reference.

Inner Dog Ref Count

In this section, the speaker explains why it's important to add a marker in Rust when a type owns another type. They also discuss how accessing something through a pointer that has just been deallocated can cause issues.

Importance of Adding Markers

  • It's important to add markers in Rust when a type owns another type.
  • If someone later comes along and writes code that accesses something through a pointer that has just been deallocated, the compiler won't warn them that this isn't okay.
  • This is why adding markers is necessary.

Accessing Through Pointers

  • Accessing something through a pointer that we just deallocated can cause issues.
  • If we don't have markers, Rust won't know that this type owns another type.
  • This matters if the owned type contains lifetimes.

Cell Refs L and RC

In this section, the speaker introduces cell refs L and RC.

Introduction of Cell Refs L and RC

  • The speaker introduces cell refs L and RC.

Drop Check

In this section, the speaker discusses drop check in Rust and why it's important to implement RC properly.

Importance of Implementing RC Properly

  • It's important to implement RC properly because of drop check in Rust.
  • Drop check is fairly complicated but it's covered in more detail in the unsafe nomicon.
  • When we write Rust without markers, Rust doesn't know that this type owns another type.
  • This matters if the owned type contains lifetimes.

Code Demonstration

In this section, the speaker demonstrates a code example to explain what goes wrong if we don't add markers.

Code Demonstration

  • The speaker demonstrates a code example to explain what goes wrong if we don't add markers.
  • If T is dropped before foo, Rust will catch it as a problem.
  • This is known as the drop check.

Understanding Phantom Data in Rust

This section explains the concept of phantom data in Rust and how it is used to ensure that the dropping of an RC is checked at compile time.

Phantom Data

  • Phantom data tells Rust to treat a type as though there is one present, even if there isn't.
  • It ensures that when we drop an RC, we treat it as dropping one of these types.
  • The marker is needed when T is not static, but we want to allow any T here.
  • The standard library changed from using just a t2 to a wrapper internally to guard against someone accidentally writing an implementation for RC in ER.

Question Mark Sized Types

  • Question mark sized types are used to opt-out of the requirement that every generic argument must be sized.
  • Coerce incised trait deals with some restrictions on implementing RC Foley yourself if you want to support dynamically sized types.

Benefits of Using Rust Over C

  • Writing convoluted code like this rarely comes up in your own code.
  • In C, this problem would manifest as random crashes at runtime. In Rust, these problems are caught at compile time.

RefCell, Mutex and Arc

This section covers the difference between exclamation marks sized and question mark sized. It also explains how synchronous versions of RefCell, Mutex and Arc work.

Exclamation Marks Sized vs Question Mark Sized

  • Exclamation mark sized means not sized.
  • Question mark size means it does not have to be sized.
  • Default is that everything has a size bound.
  • Opt out of that bound by using exclamation marks or question marks.

Synchronous Versions of RefCell, Mutex and Arc

  • Strategies written so far don't quite work right in the cell case if you have multiple threads to can mutate at the same time there just is no equivalent of cell because even though you're not giving out references to things having two threads modify the same type at the same value at the same time it's just not okay so actually is no thread-safe version of cell refs l is a little interesting so in the ref cell we wrote right you have borrow and borrow mute and they return options you could totally implement a thread-safe version of ref cell one that uses an atomic counter instead of cell for these numbers.
  • Thread-safe version of ref cell can use an atomic counter instead of a cell for these numbers. CPU has built-in instructions that can increment and decrement counters in a thread safe way.
  • Multi-threaded or synchronized version of RefCell is usually our W lock.
  • Reader/writer lock (R/W lock), which is one type in sync, is basically a ref cell where counters are kept using Atomics so they are thread-safe.
  • Borrow and borrow mute in the reader/writer lock are called read and write. They don't return an option instead they always return the ref for the ref mute but what they do is they block the current thread if the borrow can't succeed yet so they block the current thread until the conditions are met.
  • If you call borrow or read on a reader/writer lock, it will block the current thread until that exclusive reference is given up and at that point that thread will resume and you'll have the shared reference similarly if you try to take the right side of the lock, it will block if there are any shared references that are giving out and it will only stop blocking once there are no more shared references.
  • Mutex is sort of a simplified version of RefCell where there's only borrow mute as you don't need to keep all these extra counts for how many readers or how many shared references there are. It's just either some other thread has a reference to it or some of the threads do not. It similarly has a guard like RefCell does where you get back a ref mute and that ref mute when you drop it is gonna decrement count and let someone else go.
  • Arc (Atomic Reference Count) is pretty much exactly same as our C except that it uses these thread safe operations these atomic CPU Atomics for managing reference count rather than cell.

Not Send Pointers

  • With an RC, it's not safe to send it to different threads because count is not thread-safe right so if I sent an RC to some other thread and that other thread dropped RC and I dropped an RC at same time both of us would try to use cell to decrement count but that's obviously not okay because cell is not thread safe and so the RC cannot be sent and non-null is indeed not sent by default.

RC vs Art

This section discusses why one might prefer RC over Art.

Cost of Using Atomics

  • RC is cheaper than Art.
  • Atomics are more expensive in terms of CPU cycles and coordination overhead between cores.
  • Non-thread safe versions are preferred because they have lower overhead.

Asynchronous Mutexes

  • Async Std, Tokyo, Futures crate, and Futures Intrusive crate all have asynchronous mutexes.

The Cow Type

This section covers the Cow type and its implementation.

Copy-On-Write

  • The Cow type is an enum that is either owned or borrowed.
  • If a Cow of T contains a reference to a T, it passes access through. If it owns the thing it contains, it gives you a reference to that.
  • If you want to modify the value inside of a copy-on-write and it's a shared reference, you can't modify it because it's shared.
  • To modify the value inside of copy-on-write when borrowing, call I get mute which will clone the value and turn it into the owned version.

Benefits of Cow Type

  • The benefit of using cow on string operations is if most of the time you don't need a copy because you're only going to read but sometimes you need to modify it.
  • Only if we do have to change something do we do cloning and mutation.

From UTF8 Lossy

This section discusses why from UTF8 Lossy returns Cow but the other UTF variants don't.

From UTF8 Lossy

  • The string type in the standard library has a function called from utf-8 lossy.
  • If the given byte string is completely valid utf-8, it passes it straight through.
  • It returns a cow stirrer because if you don't have to modify anything, you just pass it through and only if you do have to change something do we do cloning and mutation.

Cow Type and Smart Pointers

This section covers the cow type and smart pointers in Rust.

Cow Type

  • The cow type allows you to avoid allocation if you don't need to modify.
  • Other from UTA types usually allocate regardless, but with the cow type, there is no reason to do so.

Smart Pointers

  • Rust has several smart pointer types.
  • Cell is for non-thread safe non-reference interior mutability.
  • Ref cell is for dynamic interior mutability.
  • RC is for dynamically shared references where you don't know how many references are going to be or when the inner value will be dropped at runtime.
  • There are also thread-safe versions of these types called synchronized versions.
  • Cow is not really a smart pointer but kind of a copy-on-write pointer that upgrades when needed.

Recap and Next Steps

This section provides a recap of what was covered in the previous section and discusses what will be covered next.

Recap

  • The previous section covered the cow type and smart pointers in Rust.

Next Steps

  • The next stream may cover trade objects, borrow trait, or trade delegation.
Video description

In this fourth Crust of Rust video, we cover smart pointers and interior mutability, by re-implementing the Cell, RefCell, and Rc types from the standard library. As part of that, we cover when those types are useful, how they work, and what the equivalent thread-safe versions of these types are. In the process, we go over some of the finer details of Rust's ownership model, and the UnsafeCell type. We also dive briefly into the Drop Check rabbit hole (https://doc.rust-lang.org/nightly/nomicon/dropck.html) before coming back up for air. This is definitely a more technically advanced stream than some of the earlier ones, and may be a little harder to follow. I apologize for that in advance! Please do leave questions here or on Discord and I'll try to help explain what's going on. You can find the final code at https://gist.github.com/jonhoo/7cfdfe581e5108b79c2a4e9fbde38de8 and the Discord at https://discord.gg/RJdqQ9n 0:00:00 Introduction 0:01:11 Discord 0:02:31 Agenda 0:03:50 Interior Mutability 0:07:47 Cell 0:23:39 Trying to Test Cell 0:40:17 UnsafeCell 0:41:21 RefCell 0:54:21 RefCell Smart Pointer 1:06:27 Rc (reference counted ptr) 1:23:49 NonNull 1:31:55 PhantomData and Drop Check 1:44:25 ?Sized Briefly 1:47:30 Thread Safety 1:54:20 Copy-on-Write (Cow) You can watch the live version with comments at https://www.youtube.com/watch?v=1e5aDptlGoI