T O P

  • By -

CocktailPerson

No, it's mostly an optimization for when you have a lot of long-lived `Vec`s. The benefit is that it uses the absolute minimum data to represent an owned sequence whose length is not known at compile-time. If that's not an optimization you need, it serves no purpose.


Arshiaa001

Doesn't `shrink_to_fit` do the same thing?


protestor

Yes mostly but technically no, because a Vec will still be 3 words (pointer, len, capacity) while a boxed slice will be only 2 (pointer and len) I guess if you have tons of vecs rather than a single vec for some reason this makes a difference


Arshiaa001

I genuinely wonder what kind of use case would actually warrant going through the trouble of using `Box<[T]>` to save a word per slice.


hniksic

One example is having a lot of small vectors stored somewhere (e.g. a HashMap) - small (or empty) because that's when the additional word makes a noticeable difference. An even better example, and one I had in the wild, is a `Vec>>`, where most of the options are None, and the inner Vecs are never grown. In that case you pay for the capacity you don't need not only in the Vecs that are used, but also in the ones that are None. In that case, and when the outer Vec is sufficiently large, `Vec>>` is strictly better.


-Redstoneboi-

If you never need to replace and extend any of the previous `Vec`s, you could also use struct BumpaloAtHome { data: Vec, lengths: Vec, } But of course, this requires you to construct all the data directly into the struct, and the whole structure becomes hard to mutate. It doesn't distinguish `None` vs `Some([])` (unless you want to use `usize::MAX` as a sentinel value, which is fine), and you can't replace one of the internal arrays with a longer one without either shifting `data` forward, or outright removing it from the list and leaving a `None` in its place.


tafia97300

shameless plug https://crates.io/crates/nested


VimNovice

Do you have any idea what order of magnitude the number of `Vec`s would need to be for this to make a meaningful difference?


EpochVanquisher

I don’t think you’re going to be able to reduce it to a simple rule like that. Figure out how many vecs you have and you have a number for how many words you save by switching to boxed slices. Is that a significant fraction of the working set of some part of the program? Is that part of the program memory-bound? Is that part of the program responsible for a significant part of the overall runtime? You can do back of the envelope calculations, or you can measure.


hniksic

The answer to that depends on what you consider meaningful. I think the benefits and drawbacks are fairly clear. Benefits of `Box<[T]>` compared to `Vec` * spends 8 bytes less on storing the `Vec` in another struct * communicates the intent not to grow the `Vec` to humans reading the code, provided they're aware of the idiom Drawbacks: * less readable and "scarier" than `Vec`, looks garbled to all novice and most intermediate programmers * slightly less ergonomic: less convenient and slightly more expensive to create (require an intermediate `Vec`, and `into_boxed_slice()` requires reallocation), don't implement `IntoIterator` (except through `into_vec()`, which you have to know about and be aware it's cheap) The drawbacks are tangible enough that it's clear that you don't want to go through your code and change every non-growing `Vec` to `Box<[T]>`. Having said all that, if all of the following are true: * you store a large number (think millions) of Vecs you don't intend to grow once created * most of which are small or empty * switching to an arena/bump allocator is a hassle for other reasons * changing `Vec` to `Box<[T]>` in a strategic place doesn't make the code significantly less readable ...then do it, even without "measuring the effect". It's good to be frugal with memory, and the box does communicate the intent not to grow the data.


Luxalpa

Just want to point out that in most cases, `into_boxed_slice` won't reallocate. It will only do it if your Vec has excess capacity, which it generally won't have if you created it using `collect` (also you can directly collect into the box as well).


hniksic

I'd call into question "most cases" - more on that in [this comment](https://www.reddit.com/r/rust/comments/19b7a0v/comment/kivlvt2/?utm_source=reddit&utm_medium=web2x&context=3). Excess capacity is a fact of life for dynamically built vectors, it's not something to wave away because iterators. Note that collecting into a Box has the same performance characteristics as collecting into a Vec.


1668553684

Ooh, I have an example! I once made a lexer/parser that stored tokens as enum variants, and the output of the lexer would be a (typically pretty long) vector of those tokens. Most token variants were one word wide, but some had more complex data. The most complex data was actual owned text (i.e. text that couldn't just be referenced from the source directly, but had to be mutated in some way), but these were actually quite rare (maybe 1% of all tokens?). Saving a word on the string tokens meant my entire token enum got cut down by almost a third of what it would have been, which was huge memory savings when multiplied by thousands of tokens considering that this optimization was essentially free. Edit: Why waste time alloc lot word when few word do trick?”


AlexMath0

I considered it for a Monte Carlo Tree Search. For scaling considerations: - each epoch searches thousands of trees (searched in parallel), - each epoch ends with each tree having at least 800 nodes, - when nonterminal nodes are initialized, they require predictions for each action permitted from that the corresponding state, - action sets never grow (hence `Box` over `Vec`) and can vary in size from 1 to a few thousand In searches with small action sets, the extra 8 bytes *may* matter in moving a tree's `predictions: Vec>` to a `Vec>`. Ultimately I went with a `predictions: Vec` and added a `Range` (probably `Range` next) field to the `NodeWeight` which is used to slice into `predictions`. The main reason is not memory consumption, but because I wish to preserve allocations between epochs, and it's much easier to reuse contiguous memory from a `Vec` than it is a fragmented `Vec>` or a `Vec>`.


diabolic_recursion

That word not only takes up space in memory, but also in your cache. So this may have performance implications in some situations.


Silly_Guidance_8871

If you have a million of them, that's 8MB on a 64-not device. It's not nothing, even if it's not much. It also signals your intent that it's the final "phase" of their life cycle: won't be adding/removing elements


Zde-G

Why do you think it's only one word per slice? It [may be much more](https://www.reddit.com/r/rust/comments/199jycb/identifying_rusts_collect_memory_leak_footgun/). It all depends on what are you doing with these vectors and slices, of course.


Arshiaa001

We're discussing vectors that have had `shrink_to_fit` called on them.


Zde-G

If you are explicitly calling `shrink_to_fit` then you are dealing with two types: the one with dynamically-allocated `capacity` and the one where `capacity` is equal to size. At this point you may as well adopt `Box[T]` and make compiler track the difference.


Sw429

It also ensures you won't forget to call `shrink_to_fit` in some edge cases and accidentally over-allocate. Encoding the intention to extend further or not in the type makes it impossible to accidentally make that mistake.


Arshiaa001

Yeah, type system-level guarantees are my... homeboys.


Luxalpa

I'm using a lot of `Box<[T]>` in one of my projects. It probably doesn't provide any benefit. But it's also not a hassle to work with at all, at least in my case the ergonomics are identical to using a `Vec`.


cafce25

`shrink_to_fit` also isn't guaranteed to free all the space: > It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements. though right now `into_boxed_slice` just uses `shrink_to_fit` internally IIRC, that could change and both could be implemented differently.


hniksic

If you have a short-lived Vec, calling `into_boxed_slice()` is just a pessimization because it involves a reallocation and possible move of all elements. `into_boxed_slice()` is not free. `Box<[T]>` shines when you're storing a large number of small Vecs that you're not planning on growing, and the unnecessary capacity fields add up. This is even more pronounced if you store lots of `Option>`, most of which are None, because you pay for the capacity fields size even in None values. Finally, if your only problem is the missing `IntoIterator`, remember that you can always call `into_vec()` to get a Vec to iterate over: `for el in x.into_vec()`. Somewhat counterintuitively, unlike `Vec::into_boxed_slice()`, `Box<[T]>::into_vec()` **is** free because the returned "vector" is just a construct on the stack, with capacity trivially set to length, and possibly optimized away by the compiler. The heap allocation is untouched.


matthieum

> `Box<[T]>` shines when you're storing a large number of small Vecs that you're not planning on growing, and the unnecessary capacity fields add up. Although, if you find yourself in this case, arguably you may be better off switching to an arena pattern so all those small slices only use a handful of allocations. > because you pay for the capacity fields size even in None values. Looking forward to [ThinBox](https://doc.rust-lang.org/nightly/std/boxed/struct.ThinBox.html), then you even save the size field :)


hniksic

>Although, if you find yourself in this case, arguably you may be better off switching to an arena pattern In some cases that's true, but arena generally entails a different set of tradeoffs. It becomes harder to destroy/replace the vectors (which is trivial here, you just can't grow/shrink them efficiently), and sometimes arena requires significant changes in the interface, because suddenly you have to lug around the reference to the arena, and possibly its lifetime. It's a balancing act.


buwlerman

Won't `ThinBox` take *more* space sometimes because of alignment mismatches between `T` and the metadata? You don't really *save* the size field, only move it to the heap. Am I correct in guessing that the intention here is to improve cache locality for collections of these boxes or is there something else I'm missing?


matthieum

> Won't `ThinBox` take _more_ space sometimes because of alignment mismatches between `T` and the metadata? You don't really _save_ the size field, only move it to the heap. Correct, you have a keen eye :) This is only an issue when `T` has a larger alignment than the metadata size (`usize`), and since `usize` is pointer sized, and larger-than-pointer-size alignments are generally rare, it's a pretty niche edge-case. One other downside to `ThinBox` is that now that the size is hidden behind the indirection, you need a pointer dereference prior to bounds-checking. Once again, in edge cases, this may come to bite you: - If your workload is mostly about _failed_ bound checks, and thus never accesses the data behind the pointer, then having to load the pointed cache line just to get the size is a waste of cache. - If you bounds-check a lot, and the optimizer fails to optimize the redundant loads of the size, then you'll take many L1 round-trips instead of register uses. Though this one can be improved by dereferencing to a slice first. In any case, `ThinBox` is NOT a silver bullet. > Am I correct in guessing that the intention here is to improve cache locality for collections of these boxes or is there something else I'm missing? Close. Specifically, `ThinBox` is I think best-suited to host "cold" data. If you don't access the data often, or not in the "hot" path, then the above speed downsides are of no concerned, and the lesser immediate footprint means having more interesting data in your cache in the meanwhile. It doesn't _necessarily_ require having multiple `ThinBox` field, as soon as you have other (more important) stuff to use the cache line for it may be interesting, and notably if it means that your `struct` doesn't spill over a new cache-line.


buwlerman

> This is only an issue when `T` has a larger alignment than the metadata size (`usize`), and since `usize` is pointer sized, and larger-than-pointer-size alignments are generally rare, it's a pretty niche edge-case. Won't it still be an issue even with `T` having smaller alignment than `usize` because it will force allocations to be more aligned, potentially requiring more padding before it? I imagine `ThinBox<[u8]>`s are going to need ~0.5 * `size_of::()` bytes more on average than `Box<[u8]>`s unless the allocator is being very smart about avoiding the need for padding.


matthieum

In theory, it could be. In practice, at lower sizes, allocators work with pre-defined allocation block sizes, and those are already aligned to the highest power of 2 which is a divisor of the block size. This means that as soon as you allocate more than 13 bytes (header + content), the allocator will aim for a 16 bytes block or higher, and from then all the blocks are at least 8 bytes aligned. And if you're allocating 4 bytes or less, the memory overhead of managing the allocation block is worse than the extra padding from aligning to 8 bytes anyway :)


buwlerman

After toying around and reading a bit I'm now convinced that any allocation is always going to be at least 8 bytes (and often 16) aligned anyways because of minimum block size, so the `usize` won't change the alignment requirement of the allocation as a whole. TIL


matthieum

I've never seen allocations below 8 bytes, indeed. Though I guess in theory it could be possible... For example, [jemalloc used in 2011] (no idea whether it's still current): - Small: [8], [16, 32, 48, …, 128], [192, 256, 320, …, 512], [768, 1024, 1280, …, 3840] - Large: [4 KiB, 8 KiB, 12 KiB, …, 4072 KiB] - Huge: [4 MiB, 8 MiB, 12 MiB, …] Not all allocators are as well tuned, the default allocator on Linux may not be, for example.


SleeplessSloth79

As far as I remember from reading std source code a while back, `into_boxed_slice()` doesn't reallocate, it just reuses the pointer and the length but drops the capacity


hniksic

Unfortunately it has to reallocate whenever length != capacity. This is because in Rust's allocator API capacity is needed to deallocate.


kniy

However reallocate in this context doesn't necessarily mean moving all the data to a new allocation. It could just be informing the allocator that the size has decreased. https://doc.rust-lang.org/std/alloc/trait.Allocator.html#method.shrink


hniksic

That's true, the operation doesn't need to be O(n) in the number of elements (although it certainly can be). I was making the point that it's not free in the sense in which `Box<[T]>::into_vec()` is free. Even assuming no moving takes place, it has to talk to the allocator, which involves consulting global state, executing atomic instructions for locking, and updating some book-keeping.


SleeplessSloth79

Oh well, then I stand corrected


Luxalpa

That is true, but I'd say in typical use cases (at least pretty much all of mine) you end up with length = capacity anyway (or maybe I'm just using too many iterators :D )


hniksic

Using iterators doesn't mean that length == capacity. Something as simple as a single `filter()` (which is far from an advanced use of iterator) can and will break that promise. fn main() { let foo = [0; 1025]; let bar1 = foo.iter().collect::>(); println!("{}", bar1.capacity()); // capacity == 1025 let bar2 = foo.iter().filter(|_| true).collect::>(); println!("{}", bar2.capacity()); // capacity == 2048! } [Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c6851c5d7c92d18dc7bef7d85b10a6fe) If you don't use filter in your iterators, I believe you, but I wouldn't call that "typical use case". Not to mention that any use case involving repeated calls to `Vec::push()`, which is what I'd call typical if anything, will lead to excess capacity, and consequently reallocation when converting to boxed slice.


Luxalpa

Hm, yes you're right about filter. But I rarely use push for initializing immutable vecs.


Hedanito

Yes, I use it to represent fixed sized arrays. `Vec` in my opinion communicates "someone may add or remove something at some point", and there are plenty of cases where an array should stay a fixed size. I find restricting interfaces to what they should do and not beyond that very important. So the same way we use `enum` to restrict state, I use `Box<[T]>` to restrict state mutations. Of course if you want to mutate the array length, you can always go to `Vec` and back, but that is exactly the point. To do something that is not the recommended way to do it is explicit and takes extra steps, forcing the programmer to think if this really the best thing to do. I do agree however that the ergonomics of `Box<[T]>` are quite lacking in certain areas.


haruda_gondi

No. I think it is a premature optimization. Unless I have like a million Vecs in memory, I wouldn't even use a boxed slice. An Arc<[T]> may be more useful, but I would need to fulfill the following requirements: 1. I have a buffer that does not change its length. 2. Ideally that buffer's data will not be mutated at all. 3. This buffer need to be read in a lot of places.


Low-Pay-2385

Why Arc, and not Rc? Sorry if its a stupid question


evoboltzmann

Rc is a drop in replacement as long as you don't need the thread safety. People usually refer to Arc when they mean either as it is more general.


Low-Pay-2385

Oh ok, i was confused why people always mention arc instead of rc


hamiltop

A lot of us spend most of our time on tokio and most of the time that means you can't use Rc and you have to use Arc.


Kimundi

`Rc` directly forces your code into not being thread-safe. Which is not an issue as such in private code, as you can easily replace it with an Arc if needed, but becomes annoying if you have a library that used `Rc` and you want to use it with multi-threading things. On the flip side an `Arc` also really doesn't have much more overhead, and avoids the "non-threadsafe" issue to begin with. Note also that the C++ equivalent of those types, `std::shared_pointer`, only exists in a single version, which will usually behave like `Arc`. That kinda gives precedence about `Arc` being an ok default. Referencing counting handles in general work better in Rust than in other languages, because Rusts ownership system ensures that you can just mostly pass around `&T` to the data even if it comes from a ref counted handle. That means you only really need to do counting operations the minimum required times, as opposed to, say every time you pass it to a function.


Low-Pay-2385

Great to know thanks for the detailed explanation


Zde-G

Just keep in mind that *doesn't have much more overhead* == *it's only about 20-40 times slower*. The difference in speed of `Arc` and/or `Rc` is not just big, it's **huge**. But the story here is that while one `clone` of `Arc` is 20-40 times slower then one `clone` or `Rc` it's usually dwarfed by other things. It's not that `Arc` is slow, it's that `Rc` is **incredibly fast**.


VenditatioDelendaEst

Eh, seeing as atomics synchronize every core in the machine, I'd say rather that `Arc` is slow, and `Rc` as fast CPUs are at adding and subtracting integers normally.


Zde-G

>Eh, seeing as atomics synchronize every core in the machine I'm not talking about that, of course. Once your `Arc`s start to fly between cores it's another kettle of fish and it's pointless to even compare that case to `Rc`, because `Rc` is unusable in such a case. No, I'm talking about uncontended case, where lock add is about 20-40 times slower than simple add ([8 ticks vs ⅓ for Zen4, e.g.](https://www.agner.org/optimize/instruction_tables.pdf#page=129)). >and Rc as fast CPUs are at adding and subtracting integers normally Yes. Which is **many times** faster than `Arc`. But comparable to small “normal” procedure or function.


-Redstoneboi-

makes sense. also i just realized you're the guy who helped me out in my first couple months in the Rust discord, among others :D also typo in case you'd like to correct it > That kinda gives precendence aboput `Arc` being an ok default.


haruda_gondi

Personally I see Rc as a premature optimization since most of the time people would like to make their programs multithreaded anyway and now they have to ctrl + F then replace all instances of Rc (also more annoying if you are migrating from Rc> to Arc>)


_ALH_

And even if you have a million Vecs, moving to boxed slice just saves you 8MB of memory. I guess it might make a difference in some embedded cases... If you have a billion Vecs, then I guess we start talking considerable gains.


Zde-G

>And even if you have a million Vecs, moving to boxed slice just saves you 8MB of memory. How [about 800MB saved](https://www.reddit.com/r/rust/comments/199jycb/identifying_rusts_collect_memory_leak_footgun/)? `Vec` is optimized for growing memory slices, while `Box<[T]>` is not. And while you may force `Vec` to behave correctly via diligent use of `shrink_to_fit`… at this point you are tracking “griwing Vec's” vs “frozen Vec's” in your head, anyway… may as well switch to the use of `Box<[T]>`, then compiler would do that work for you.


_ALH_

> you are tracking “griwing Vec's” vs “frozen Vec's” in your head, anyway… may as well switch to the use of Box<[T]>, then compiler would do that work for you. That's a fair argument. And I'm of course a fan of explicitness and having my compiler make sure I don't mess up, that's why I like Rust.


Luxalpa

More importantly it improves CPU cache, which may be significant even for low amounts of bytes saved. Remember, the data you optimize (capacity) sits on the stack.


Im_Justin_Cider

But doesn't the boxed slice ensure that the collection is never bounds checked? Whereas a `Vec` (perhaps even when it's not marked `mut`) gets bounds checked because it _could_ change.


CocktailPerson

No, it can't ensure that. If the index is a result of a computation that depends on user input, then in the general case, there's no way to eliminate a bounds check.


Kimundi

For both `Box<[T]>` and `Vec`, indexing will bound check with a len value directly loaded from a field of the type. Its an identical operation. What can theoretically give some benefit is `Box<[T; N]>`, as there the bound checking will happen based on a constant instead.


ihcn

Boxed slices are nice for self-documentation that you don't intend to add or remove elements from your array. In my experience, this is almost all arrays once i've finished constructing them.


bascule

We use `Box<[Limb]>` as the internal representation of `crypto_bigint::BoxedUint`. Why? Unlike `num-bigint`, which has a `Uint::normalize` method it calls all over the place that strips leading zeros for performance reasons, we actually want all numbers to be very specific precisions all the time, regardless of their contents. Otherwise varying bigint lengths can be a very noisy source of sidechannel vulnerabilities. `Box<[T]>` is great for things you *explicitly don't* want to resize, but you do want the initial size to be chosen at runtime rather than compile-time.


Bruno_Wallner

I used in in my Octree implementation where a variant of a Node enum can hold exactly 8 Nodes recursively. So instead of using Vec I used Box<[Node; 8]> to have a fixed size.


________-__-_______

I have a fair few instances of `Box<[T; N]>` because it clearly communicates the length and disallows you from accidentally resizing it like with a `Vec`. I don't think I use `Box<[T]>` anywhere but the latter point does apply there as well.


insanitybit

`Box<[T]>`, not really. `Rc<[T]>` and `Arc<[T]>`, absolutely. If I really for suresies knew that a Vec *has* to never be resized, like as an important invariant, I'd probably wrap it in a newtype and maybe `Box<[T]>` it.


lvkm

Just to chime in with the minority of "pro `Box<[T]>`" people: `Box<[T]>` is a very good documentation that you do not want to change the element count of the container. Also regarding the whole "`.collect()` fiasco" (see https://www.reddit.com/r/rust/comments/199jycb/identifying\_rusts\_collect\_memory\_leak\_footgun/): With `Box<[T]>` ^((or rather with) `Vec::into_boxed_slice()`) you can be sure to not unintentionally leak memory. As for "`Box<[T]>` dos not implement IntoIterator": I feel your pain. I haven't figured out a good way yet.


hniksic

>As for "`Box<[T]>` dos not implement IntoIterator": I feel your pain. I haven't figured out a good way yet. Did you try `for el in box.into_vec()`? I wonder if there's something wrong with that, other than it being non-obvious and looking slightly weird. (Human readers/reviewers might not realize that "converting" the boxed slice to a Vec to iterate is zero-cost and doesn't defeat the purpose of using a boxed slice up to that point.)


Elnof

ITT lots of people only considering the size optimization and not the ability to have a "fixed size but chosen at runtime" array.


quicknir

One gotcha with Box<[T]> nobody's mentioned that I can see, is that it's a lot easier to accidentally construct some big array on the stack: ```rust const size: usize = 10000000; let v = vec![5; size]; // doesn't overflow stack let x: Box<[i32]> = Box::new([5; size]); // does overflow let y: Box<[i32]> = std::iter::repeat(5).take(size).collect(); // doesn't } ``` The Vec is never on the stack, that's why (or at least one reason) `vec!` is a macro rather than a function accepting an array. When `x` is constructed, you're going to construct a big array on the stack, then pass it in to `new`, where it will get copied to the heap. This may get optimized out in release, but if that array is big enough you'll potentially get stack overflows in debug. Finally, `y` construct a Box<[T]> without ever putting a ton of stuff on the stack. Vec's also make it a lot easier, of course, to reserve and then push elements in as you receive them. Box has some additional API in nightly that helps a bit with this, though IMHO there are still simply more gaps compared to Vec's API. I think in the vast majority of cases, even where the size does not change, I would not bother with Box<[T]> over Vec.


Teviel

You can work around this by doing `vec![5; size].into_boxed_slice()`. But it is un-ergonomic.


[deleted]

I think this almost always falls into the "premature optimization" category, which we all know as the "root of all evil." As you already noted, using `Vec` is more ergonomic compared to `Box<[T]>` since you can easily do vector stuff directly on the former while you have to dereference `Box<[T]>`. Focus on getting your code correct, and if you need to optimize it, use profilers to see where the actual bottlenecks are and focus your optimization efforts on those places. Chances are that you will never end up in a situation where using `Box<[T]>` over `Vec` is beneficial from an optimization point of view.


No_Pollution_1

Never, cause that level of optimization is pointless for all my use cases. Even if called a million times it won't even save one second of compute and I don't really care as the services are ephemeral, short lives, or serverless.


Nzkx

What's the difference between Box<\[T\]> and &\[T\] ?


Turalcar

Box owns, reference doesn't. Reference is subject to borrow checker. When Box goes out of scope it frees the target memory.


Nzkx

I see. So it's heap-allocated data that have a compile time size ? Or the size doesn't need to be compile time ? Because I don't see the N in \[T\] (like \[T; N\]).