This page is heavily inspired by the well-known bithacks page. I’ll be documenting bithacks I had to make that I couldn’t find there.

This page, unlike the original one, will contain explanations.

All the code will be in Zig.

I don’t think any of this code could be copyrighted so I couldn’t even put it in CC-0 even if I wanted to. Use as you wish.

Detect Which Nibbles or Bytes in a Word Is Nonzero#

Motivation#

Useful to produce a layer of a tree encoded through a morton curve, given the layer below.

Functions#

64 bits into 8 bits, grouped by 8 bits#

pub inline fn funny_merge_64_8(v: u64) u8 {
    var w = v;

    w |= ((w & 0xAAAAAAAAAAAAAAAA) >> 1) | ((w & 0x5555555555555555) << 1);
    w |= ((w & 0xCCCCCCCCCCCCCCCC) >> 2) | ((w & 0x3333333333333333) << 2);
    w |= ((w & 0xF0F0F0F0F0F0F0F0) >> 4) | ((w & 0x0F0F0F0F0F0F0F0F) << 4);

    w = (w & 0x8040_2010_0804_0201) *% 0x0101010101010101;

    return @intCast(w >> 56);
}

32 bits into 4 bits, grouped by 8 bits#

pub inline fn funny_merge_32_8(v: u32) u4 {
    var w = v;

    w |= ((w & 0xAAAAAAAA) >> 1) | ((w & 0x55555555) << 1);
    w |= ((w & 0xCCCCCCCC) >> 2) | ((w & 0x33333333) << 2);
    w |= ((w & 0xF0F0F0F0) >> 4) | ((w & 0x0F0F0F0F) << 4);

    w = (w & 0x0804_0201) *% 0x01010101;

    return @intCast(w >> 28);
}

32 bits into 8 bits, grouped by 4 bits#

pub inline fn funny_merge_32_4(v: u32) u8 {
    const or1 = val | ((val & 0xAAAAAAA) >> 1) | ((val & 0x55555555) << 1);
    const or2 = or1 | ((or1 & 0xCCCCCCC) >> 2) | ((or1 & 0x33333333) << 2);

    const res_hi = (((or2 >> 16) & 0x8421) * 0x1111) >> 12;
    const res_lo = (((or2 >>  0) & 0x8421) * 0x1111) >> 12;

    return @intCast((res_hi << 4) | res_lo);
}

Explanation#

Notice how the mask and ths shift are reversed here. It doesn’t really matter in what order you make them.

Say you have a number: 0x2305007B and you want to know what bytes are nonzero in the form: 0b1101.

  1. convert the number into 0xFF0F00FF:

    1. do an OR between pairs of single bits to get a 0x330F00FF:

    v |= ((v >> 1) & 0x55555555) | ((v << 1) & 0xAAAAAAAA)

    1. do an OR between pairs of two bits to get a 0xFF0F00FF:

    v |= ((v >> 2) & 0x33333333) | ((v << 2) & 0xCCCCCCCC)

    1. do an OR between pairs of four bits to get a 0xFFFF00FF:

    v |= ((v >> 4) & 0x0F0F0F0F) | ((v << 4) & 0xF0F0F0F0)

  2. convert the number into 0b1101:

    1. mask the number with 0x08040201:

      v &= 0x08040201

      This has the effect of turning every filled byte into the value they would end up contributing to the final number.

      The number we end up with here is 0x08040001. the 08 byte represents the bit that would end up contributing 08 to the final result.

      This will become relevant in the next step:

    2. multiply with 0x01010101:

      v *= 0x01010101

      Since these values are unsigned (signed multiplication works differently), the following long multiplication will take place:

            08040001
            01010101
            --------
            08040001
          08040001
        08040001
      08040001
      --------------
      080C0C0D050101
      

      But since the overflows have no effect in unsigned multiplication, the process becomes:

      08040001
      01010101
      --------
      08040001
      040001
      0001
      01
      --------
      0D050101
      

      Notice how the leftmost column is the addition of all the contributions, meaning that that’s our result, which does check out: we were looking for 0b1101 aka 0xD

      Also notice how there could be no carries under this multiplication.

    3. Shift right 24:

      v >>= 24

      And so we have our result.

Pad every bit in a value with two zeroes#

Motivation#

3-D morton curves.

Function#

inline fn pad_3(v: u4) u12 {
    var w: u48 = @intCast(v);

    w *= 0x001_001_001_001;
    w &= 0x008_004_002_001;
    w *%= 0x001_004_010_040;

    return @intCast(w >> 36);
}

inline fn pad(v: anytype) @Type(std.builtin.Type{ .Int = .{
    .bits = @typeInfo(@TypeOf(v)).Int.bits * 3,
    .signedness = .unsigned,
} }) {
    const T = @TypeOf(v);
    const v_bits = @typeInfo(T).Int.bits;
    const full_nibbles = v_bits / 4;
    const excess_bits = v_bits % 4;

    const Ret = @Type(std.builtin.Type{ .Int = .{
        .bits = v_bits * 3,
        .signedness = .unsigned,
    } });

    var ret: Ret = 0;

    //aaaabbbbccc
    //0   1
    //1   0
    inline for (0..full_nibbles) |i| {
        const j = full_nibbles - i - 1;
        const shr = j * 4 + excess_bits;

        const nibble: u4 = @intCast((v >> @intCast(shr)) & 0xF);
        const padded_nibble: Ret = @intCast(pad_3(nibble));

        ret <<= 12;
        ret |= @intCast(padded_nibble);
    }

    if (excess_bits != 0) {
        const mask = (@as(T, 1) << excess_bits) - 1;
        const excess: u4 = @intCast(v & mask);
        const padded_excess: Ret = @intCast(pad_3(excess));

        const omask = (@as(Ret, 1) << (excess_bits * 3)) - 1;
        ret <<= excess_bits * 3;
        ret |= padded_excess & omask;
    }

    return ret;
}

Explanation#

assume v = 0b1101 or 0x0C

  1. copy the value into all 12 bit sections of a 48 bit number:

    v *= 0x001001001001

    so v became 0x00C00C00C00C

  2. filter out the contributed bits alone

    What do I mean by this? The end goal is to have a 48-bit number with 4 12-bit groups, wherein the nth bit is set if the nth bit in the original value is set. 0b0001 becomes 0x000000000001 and 0b1010 becomes 0x004000002000.

    v &= 0x008004002001

    v became 0x008004000001

  3. shift & sum

    So in the original value, we want to end up having shifted the 0th bit left by 0, the 1st bit by 2, the 2nd by 4 and the 3rd by 6, right? Those shifts amount to multiplications by 1, 4, 16, and 64. And remember from the first hack what multiplication can do:

    008004000001
    001004010040
    ------------
    200100000040
    040000010
    000004
    001
    

    Notice how the multiplier being “reversed” is actually correct. And notice how the last column, once again, represents our desired result: 0x200 is 0b1000 shifted left by 6, 0x040 is 0b0100 shifted left by 4 and 0x001 is 0b0001 shifted left by, well, 0.

    So the operation here is: v *= 0x001004010040

  4. shift the last column right

    v >>= 36

    And we’re done