Table of Contents
This page is heavily inspired by the well-known bithacks page. I’ll be documenting bithacks I had to make that I couldn’t find there.
This page, unlike the original one, will contain explanations.
All the code will be in Zig.
I don’t think any of this code could be copyrighted so I couldn’t even put it in CC-0 even if I wanted to. Use as you wish.
Detect Which Nibbles or Bytes in a Word Is Nonzero#
Motivation#
Useful to produce a layer of a tree encoded through a morton curve, given the layer below.
Functions#
64 bits into 8 bits, grouped by 8 bits#
pub inline fn funny_merge_64_8(v: u64) u8 {
var w = v;
w |= ((w & 0xAAAAAAAAAAAAAAAA) >> 1) | ((w & 0x5555555555555555) << 1);
w |= ((w & 0xCCCCCCCCCCCCCCCC) >> 2) | ((w & 0x3333333333333333) << 2);
w |= ((w & 0xF0F0F0F0F0F0F0F0) >> 4) | ((w & 0x0F0F0F0F0F0F0F0F) << 4);
w = (w & 0x8040_2010_0804_0201) *% 0x0101010101010101;
return @intCast(w >> 56);
}
32 bits into 4 bits, grouped by 8 bits#
pub inline fn funny_merge_32_8(v: u32) u4 {
var w = v;
w |= ((w & 0xAAAAAAAA) >> 1) | ((w & 0x55555555) << 1);
w |= ((w & 0xCCCCCCCC) >> 2) | ((w & 0x33333333) << 2);
w |= ((w & 0xF0F0F0F0) >> 4) | ((w & 0x0F0F0F0F) << 4);
w = (w & 0x0804_0201) *% 0x01010101;
return @intCast(w >> 28);
}
32 bits into 8 bits, grouped by 4 bits#
pub inline fn funny_merge_32_4(v: u32) u8 {
const or1 = val | ((val & 0xAAAAAAA) >> 1) | ((val & 0x55555555) << 1);
const or2 = or1 | ((or1 & 0xCCCCCCC) >> 2) | ((or1 & 0x33333333) << 2);
const res_hi = (((or2 >> 16) & 0x8421) * 0x1111) >> 12;
const res_lo = (((or2 >> 0) & 0x8421) * 0x1111) >> 12;
return @intCast((res_hi << 4) | res_lo);
}
Explanation#
Notice how the mask and ths shift are reversed here. It doesn’t really matter in what order you make them.
Say you have a number: 0x2305007B
and you want to know what bytes are nonzero in the form: 0b1101
.
-
convert the number into
0xFF0F00FF
:- do an OR between pairs of single bits to get a
0x330F00FF
:
v |= ((v >> 1) & 0x55555555) | ((v << 1) & 0xAAAAAAAA)
- do an OR between pairs of two bits to get a
0xFF0F00FF
:
v |= ((v >> 2) & 0x33333333) | ((v << 2) & 0xCCCCCCCC)
- do an OR between pairs of four bits to get a
0xFFFF00FF
:
v |= ((v >> 4) & 0x0F0F0F0F) | ((v << 4) & 0xF0F0F0F0)
- do an OR between pairs of single bits to get a
-
convert the number into
0b1101
:-
mask the number with
0x08040201
:v &= 0x08040201
This has the effect of turning every filled byte into the value they would end up contributing to the final number.
The number we end up with here is
0x08040001
. the08
byte represents the bit that would end up contributing08
to the final result.This will become relevant in the next step:
-
multiply with
0x01010101
:v *= 0x01010101
Since these values are unsigned (signed multiplication works differently), the following long multiplication will take place:
08040001 01010101 -------- 08040001 08040001 08040001 08040001 -------------- 080C0C0D050101
But since the overflows have no effect in unsigned multiplication, the process becomes:
08040001 01010101 -------- 08040001 040001 0001 01 -------- 0D050101
Notice how the leftmost column is the addition of all the contributions, meaning that that’s our result, which does check out: we were looking for
0b1101
aka0xD
Also notice how there could be no carries under this multiplication.
-
Shift right 24:
v >>= 24
And so we have our result.
-
Pad every bit in a value with two zeroes#
Motivation#
3-D morton curves.
Function#
inline fn pad_3(v: u4) u12 {
var w: u48 = @intCast(v);
w *= 0x001_001_001_001;
w &= 0x008_004_002_001;
w *%= 0x001_004_010_040;
return @intCast(w >> 36);
}
inline fn pad(v: anytype) @Type(std.builtin.Type{ .Int = .{
.bits = @typeInfo(@TypeOf(v)).Int.bits * 3,
.signedness = .unsigned,
} }) {
const T = @TypeOf(v);
const v_bits = @typeInfo(T).Int.bits;
const full_nibbles = v_bits / 4;
const excess_bits = v_bits % 4;
const Ret = @Type(std.builtin.Type{ .Int = .{
.bits = v_bits * 3,
.signedness = .unsigned,
} });
var ret: Ret = 0;
//aaaabbbbccc
//0 1
//1 0
inline for (0..full_nibbles) |i| {
const j = full_nibbles - i - 1;
const shr = j * 4 + excess_bits;
const nibble: u4 = @intCast((v >> @intCast(shr)) & 0xF);
const padded_nibble: Ret = @intCast(pad_3(nibble));
ret <<= 12;
ret |= @intCast(padded_nibble);
}
if (excess_bits != 0) {
const mask = (@as(T, 1) << excess_bits) - 1;
const excess: u4 = @intCast(v & mask);
const padded_excess: Ret = @intCast(pad_3(excess));
const omask = (@as(Ret, 1) << (excess_bits * 3)) - 1;
ret <<= excess_bits * 3;
ret |= padded_excess & omask;
}
return ret;
}
Explanation#
assume v = 0b1101
or 0x0C
-
copy the value into all 12 bit sections of a 48 bit number:
v *= 0x001001001001
so v became
0x00C00C00C00C
-
filter out the contributed bits alone
What do I mean by this? The end goal is to have a 48-bit number with 4 12-bit groups, wherein the nth bit is set if the nth bit in the original value is set.
0b0001
becomes0x000000000001
and0b1010
becomes0x004000002000
.v &= 0x008004002001
v
became0x008004000001
-
shift & sum
So in the original value, we want to end up having shifted the 0th bit left by 0, the 1st bit by 2, the 2nd by 4 and the 3rd by 6, right? Those shifts amount to multiplications by 1, 4, 16, and 64. And remember from the first hack what multiplication can do:
008004000001 001004010040 ------------ 200100000040 040000010 000004 001
Notice how the multiplier being “reversed” is actually correct. And notice how the last column, once again, represents our desired result:
0x200
is0b1000
shifted left by 6,0x040
is0b0100
shifted left by 4 and0x001
is0b0001
shifted left by, well, 0.So the operation here is:
v *= 0x001004010040
-
shift the last column right
v >>= 36
And we’re done