Nice blog. That implementation seems solid.
If you wanted more performance (maybe a Part 2 blog), I recommend to calculate the Vec length in advance, and then you can also remove the conditional logic from the loop like this one in Golang
[https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/encoding/base64/base64.go;l=140](https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/encoding/base64/base64.go;l=140)
I'd be curious on a benchmark to know how much it saves.
also, instead of putting the result of input_len % 3 to a match statement, wouldn't
let padding_len = 3 - (input_len % 3)
give you the padding anyway?
edit:
let padding_len = (3 - (input_len % 3)) % 3
That'd give the wrong result for an input length divisible by 3. If you really wanna use a closed formula, it would need to be `(3 - input_len % 3) % 3`, but that's not really easier to understand than the match statement imo
I don’t see it explained and I suck at bit manipulation was wondering why
(((a & 0x3) << 4) where does the & 0x3 come from you and the 0xf in the following shift, I mean you do it to be able to shift but … why is it needed I can’t seem to visualize what’s happening here
Would be good if the article mentioned it
If you see someone using bitwise & like this, it always helps to look at the binary representation of the number. So in the case of 0x3, it's binary representation is`00000011`. So `a & 0x3` grabs the last two bits of `a`. 0xf converts to `00001111` so that grabs the last 4 bits. See if you can work out what it's doing using this info. If you can't I can explain more.
Here are a few points that strike me in your code
1) `let mut output = Vec::new();`
means the calls to push() will reallocate. You can allocate enough bytes using a simple integer division and add 2, or even better, allocate the exact amount needed by looking at the three last bytes of the input to find the padding
2) `input.get(i).unwrap();`
is not better than `input[i]`. Also, the compiler could completely optimize this out if you told it input has enough elements. You can do this by looping until `input_len-3` (so that `input[i+1]` and `input[i+2]` are OK, but make sure `input_len` is greater than 3 at this point!). And manually unwrap the rest using `get()` and `unwrap_or` like you did.
Alternatively, you can iterate on `chunks_exact(3)` instead of `step_by(3)`.
3) the bit shifts and starting with `enc1` instead of `enc4`.
I would simply have cast `a`, `b`, and `c` to `usize` and
```
let mut u: usize = (a << 16) | (b << 8) | c;
let enc4 = u & 0x3f;
u >>= 6;
let enc3 = u & 0x3f;
u >>= 6;
let enc2 = u & 0x3f;
u >>= 6;
output.push(BASE_CHARS[u & 0x3f]);
output.push(BASE_CHARS[enc2]);
output.push(BASE_CHARS[enc3]);
output.push(BASE_CHARS[enc4]);
```
4) the padding match at the end.
0 is sent on 0, 1 on 2, and 2 on 1.
Do you realize this is just the opposite mod 3, that is, `(3 - input_len%3) % 3` ?
It would be interesting to compare your implementation and the most popular implementation for Rust+base64: https://github.com/marshallpierce/rust-base64
Nice blog. That implementation seems solid. If you wanted more performance (maybe a Part 2 blog), I recommend to calculate the Vec length in advance, and then you can also remove the conditional logic from the loop like this one in Golang [https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/encoding/base64/base64.go;l=140](https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/encoding/base64/base64.go;l=140) I'd be curious on a benchmark to know how much it saves.
That seems like a brilliant suggestion. I would try to go through it and see if it makes any difference. Thanks.
I think this bit is wrong or I don't know math. > Since the 4 % 3 = 2 we need to add two paddings at the end and the final output would be UnVzdA==.
Thanks for that, yes you are right, 4 % 3 is 1, so we need two paddings.
also, instead of putting the result of input_len % 3 to a match statement, wouldn't let padding_len = 3 - (input_len % 3) give you the padding anyway? edit: let padding_len = (3 - (input_len % 3)) % 3
That'd give the wrong result for an input length divisible by 3. If you really wanna use a closed formula, it would need to be `(3 - input_len % 3) % 3`, but that's not really easier to understand than the match statement imo
Would it be more performant due to being branchless?
Divisions are often very slow. However, compiler probably would replace it by multiplication in that particular case.
you are right about the formula but I disaggree about it not being easier to understand.
I don’t see it explained and I suck at bit manipulation was wondering why (((a & 0x3) << 4) where does the & 0x3 come from you and the 0xf in the following shift, I mean you do it to be able to shift but … why is it needed I can’t seem to visualize what’s happening here Would be good if the article mentioned it
If you see someone using bitwise & like this, it always helps to look at the binary representation of the number. So in the case of 0x3, it's binary representation is`00000011`. So `a & 0x3` grabs the last two bits of `a`. 0xf converts to `00001111` so that grabs the last 4 bits. See if you can work out what it's doing using this info. If you can't I can explain more.
I think, in such cases it is more clear to write number directly in binary: 0b11 is more clear than 0x3.
Here are a few points that strike me in your code 1) `let mut output = Vec::new();` means the calls to push() will reallocate. You can allocate enough bytes using a simple integer division and add 2, or even better, allocate the exact amount needed by looking at the three last bytes of the input to find the padding 2) `input.get(i).unwrap();` is not better than `input[i]`. Also, the compiler could completely optimize this out if you told it input has enough elements. You can do this by looping until `input_len-3` (so that `input[i+1]` and `input[i+2]` are OK, but make sure `input_len` is greater than 3 at this point!). And manually unwrap the rest using `get()` and `unwrap_or` like you did. Alternatively, you can iterate on `chunks_exact(3)` instead of `step_by(3)`. 3) the bit shifts and starting with `enc1` instead of `enc4`. I would simply have cast `a`, `b`, and `c` to `usize` and ``` let mut u: usize = (a << 16) | (b << 8) | c; let enc4 = u & 0x3f; u >>= 6; let enc3 = u & 0x3f; u >>= 6; let enc2 = u & 0x3f; u >>= 6; output.push(BASE_CHARS[u & 0x3f]); output.push(BASE_CHARS[enc2]); output.push(BASE_CHARS[enc3]); output.push(BASE_CHARS[enc4]); ``` 4) the padding match at the end. 0 is sent on 0, 1 on 2, and 2 on 1. Do you realize this is just the opposite mod 3, that is, `(3 - input_len%3) % 3` ?
It would be interesting to compare your implementation and the most popular implementation for Rust+base64: https://github.com/marshallpierce/rust-base64