:: a cozy blog

This page is heavily inspired by the well-known bithacks page. I’ll be documenting bithacks I had to make that I couldn’t find there.

This page, unlike the original one, will contain explanations.

All the code will be in Zig.

I don’t think any of this code could be copyrighted so I couldn’t even put it in CC-0 even if I wanted to. Use as you wish.

Detect Which Nibbles or Bytes in a Word Is Nonzero#

Motivation#

Useful to produce a layer of a tree encoded through a morton curve, given the layer below.

Functions#

64 bits into 8 bits, grouped by 8 bits#

pub inline fn funny_merge_64_8(v: u64) u8 {
    var w = v;

    w |= ((w & 0xAAAAAAAAAAAAAAAA) >> 1) | ((w & 0x5555555555555555) << 1);
    w |= ((w & 0xCCCCCCCCCCCCCCCC) >> 2) | ((w & 0x3333333333333333) << 2);
    w |= ((w & 0xF0F0F0F0F0F0F0F0) >> 4) | ((w & 0x0F0F0F0F0F0F0F0F) << 4);

    w = (w & 0x8040_2010_0804_0201) *% 0x0101010101010101;

    return @intCast(w >> 56);
}

32 bits into 4 bits, grouped by 8 bits#

pub inline fn funny_merge_32_8(v: u32) u4 {
    var w = v;

    w |= ((w & 0xAAAAAAAA) >> 1) | ((w & 0x55555555) << 1);
    w |= ((w & 0xCCCCCCCC) >> 2) | ((w & 0x33333333) << 2);
    w |= ((w & 0xF0F0F0F0) >> 4) | ((w & 0x0F0F0F0F) << 4);

    w = (w & 0x0804_0201) *% 0x01010101;

    return @intCast(w >> 28);
}

32 bits into 8 bits, grouped by 4 bits#

pub inline fn funny_merge_32_4(v: u32) u8 {
    const or1 = val | ((val & 0xAAAAAAA) >> 1) | ((val & 0x55555555) << 1);
    const or2 = or1 | ((or1 & 0xCCCCCCC) >> 2) | ((or1 & 0x33333333) << 2);

    const res_hi = (((or2 >> 16) & 0x8421) * 0x1111) >> 12;
    const res_lo = (((or2 >>  0) & 0x8421) * 0x1111) >> 12;

    return @intCast((res_hi << 4) | res_lo);
}

Explanation#

Notice how the mask and ths shift are reversed here. It doesn’t really matter in what order you make them.

Say you have a number: 0x2305007B and you want to know what bytes are nonzero in the form: 0b1101.

convert the number into 0xFF0F00FF:
1. do an OR between pairs of single bits to get a 0x330F00FF:
v |= ((v >> 1) & 0x55555555) | ((v << 1) & 0xAAAAAAAA)
1. do an OR between pairs of two bits to get a 0xFF0F00FF:
v |= ((v >> 2) & 0x33333333) | ((v << 2) & 0xCCCCCCCC)
1. do an OR between pairs of four bits to get a 0xFFFF00FF:
v |= ((v >> 4) & 0x0F0F0F0F) | ((v << 4) & 0xF0F0F0F0)
convert the number into 0b1101:
1. mask the number with 0x08040201:
  
  v &= 0x08040201
  
  This has the effect of turning every filled byte into the value they would end up contributing to the final number.
  
  The number we end up with here is 0x08040001. the 08 byte represents the bit that would end up contributing 08 to the final result.
  
  This will become relevant in the next step:
2. multiply with 0x01010101:
  
  v *= 0x01010101
  
  Since these values are unsigned (signed multiplication works differently), the following long multiplication will take place:
```
      08040001
      01010101
      --------
      08040001
    08040001
  08040001
08040001
--------------
080C0C0D050101
```
  But since the overflows have no effect in unsigned multiplication, the process becomes:
```
08040001
01010101
--------
08040001
040001
0001
01
--------
0D050101
```
  Notice how the leftmost column is the addition of all the contributions, meaning that that’s our result, which does check out: we were looking for 0b1101 aka 0xD
  
  Also notice how there could be no carries under this multiplication.
3. Shift right 24:
  
  v >>= 24
  
  And so we have our result.

Pad every bit in a value with two zeroes#

Motivation#

3-D morton curves.

Function#

inline fn pad_3(v: u4) u12 {
    var w: u48 = @intCast(v);

    w *= 0x001_001_001_001;
    w &= 0x008_004_002_001;
    w *%= 0x001_004_010_040;

    return @intCast(w >> 36);
}

inline fn pad(v: anytype) @Type(std.builtin.Type{ .Int = .{
    .bits = @typeInfo(@TypeOf(v)).Int.bits * 3,
    .signedness = .unsigned,
} }) {
    const T = @TypeOf(v);
    const v_bits = @typeInfo(T).Int.bits;
    const full_nibbles = v_bits / 4;
    const excess_bits = v_bits % 4;

    const Ret = @Type(std.builtin.Type{ .Int = .{
        .bits = v_bits * 3,
        .signedness = .unsigned,
    } });

    var ret: Ret = 0;

    //aaaabbbbccc
    //0   1
    //1   0
    inline for (0..full_nibbles) |i| {
        const j = full_nibbles - i - 1;
        const shr = j * 4 + excess_bits;

        const nibble: u4 = @intCast((v >> @intCast(shr)) & 0xF);
        const padded_nibble: Ret = @intCast(pad_3(nibble));

        ret <<= 12;
        ret |= @intCast(padded_nibble);
    }

    if (excess_bits != 0) {
        const mask = (@as(T, 1) << excess_bits) - 1;
        const excess: u4 = @intCast(v & mask);
        const padded_excess: Ret = @intCast(pad_3(excess));

        const omask = (@as(Ret, 1) << (excess_bits * 3)) - 1;
        ret <<= excess_bits * 3;
        ret |= padded_excess & omask;
    }

    return ret;
}

Explanation#

assume v = 0b1101 or 0x0C

copy the value into all 12 bit sections of a 48 bit number:

v *= 0x001001001001

so v became 0x00C00C00C00C
filter out the contributed bits alone

What do I mean by this? The end goal is to have a 48-bit number with 4 12-bit groups, wherein the nth bit is set if the nth bit in the original value is set. 0b0001 becomes 0x000000000001 and 0b1010 becomes 0x004000002000.

v &= 0x008004002001

v became 0x008004000001
shift & sum

So in the original value, we want to end up having shifted the 0th bit left by 0, the 1st bit by 2, the 2nd by 4 and the 3rd by 6, right? Those shifts amount to multiplications by 1, 4, 16, and 64. And remember from the first hack what multiplication can do:
```
008004000001
001004010040
------------
200100000040
040000010
000004
001
```
Notice how the multiplier being “reversed” is actually correct. And notice how the last column, once again, represents our desired result: 0x200 is 0b1000 shifted left by 6, 0x040 is 0b0100 shifted left by 4 and 0x001 is 0b0001 shifted left by, well, 0.

So the operation here is: v *= 0x001004010040
shift the last column right

v >>= 36

And we’re done

Table of Contents

Detect Which Nibbles or Bytes in a Word Is Nonzero#

Motivation#

Functions#

64 bits into 8 bits, grouped by 8 bits#

32 bits into 4 bits, grouped by 8 bits#

32 bits into 8 bits, grouped by 4 bits#

Explanation#

Pad every bit in a value with two zeroes#

Motivation#

Function#

Explanation#