This video goes into the solution of an Advent Of Code Challenge posted in 2022. It is about finding a 14 character substring that has 14 unique characters. One of the speedups is to represent the characters in the window as bits instead of a hashset or an array.
Why a 32-bit number?
We are dealing with 26 letters, so we need at least 26 bits to keep track if the letter is present in the window or not. Clearly a 16-bit number is too short, 32-bit number is the smallest number that can contain at least 26 bits.
Which bits are used?
As mentioned in the video, each character has a numeric representation.
Easy way to map a character to a specific bit, is calculating the remainder by division by 32 (so called mod 32). In case of the character a
, it is mapped to bit 1 as 97 divided by 32 is 3 and remainder 1.
Character | Numeric | mod 32 |
---|---|---|
a | 97 | 1 |
b | 96 | 2 |
… | … | … |
y | 121 | 25 |
z | 122 | 26 |
From the above table:
- bit 0 (the bit to the far right) is not used;
- bits 1 through 26 keep track of the presence of
a
throughz
; - bits 27 through 31 are not used.
Examples
To make it more clear, here are some examples. Redundant zeros on the left are not being shown, so the bit representation in the table can have less than 32 characters.
Bit representation | numeric representation | characters found |
---|---|---|
10 | 2 | a is found |
100 | 4 | b is found |
100000000000000000000000000 | 67108864 | z is found |
110 | 6 | a and b are found |
100000000000000000000000010 | 67108866 | a and z are found |
100000000000000000000000110 | 67108870 | a, b and z are found |
High level approach
Let’s get down at the problem at hand. We have a string of 14 characters and we need to check if all characters are unique.
- Initialize the
state
variable to 0 (so no character have been found so far); - for each character
c
in the string- compute the modulo 32 of
c
. This will be the position of the bit in the state we need to check - let’s call itmodBit
; - generate the number corresponding to
modBit
by shifting 1 to the left formodBit
positions - let’s call itmodValue
; - We can now check if we already encountered
c
. Let’s do theOR
operation onmodValue
andstate
and save it innewState
;- if
newState
is different thanstate
, it means we have flipped a bit to 1. So we haven’t encounteredc
before - if
newState
is equal tostate
, it means we have not flipped a bit to 1. This means the bit on positionmodBit
was already 1, meaning we already encounteredc
- if
- compute the modulo 32 of
Walkthrough
Let’s test the string azb
for uniqueness of characters. Each row in the table is a step of the high level approach.
c | modBit | modValue | state | newState | state != newState |
---|---|---|---|---|---|
a | 1 | 2 | 0 | 2 | true |
z | 26 | 67108864 | 2 | 67108866 | true |
b | 2 | 4 | 67108866 | 67108870 | true |
So at the end of the string, we had only unique characters.
But what if we check the string azbz
?
c | modBit | modValue | state | newState | state != newState |
---|---|---|---|---|---|
a | 1 | 2 | 0 | 2 | true |
z | 26 | 67108864 | 2 | 67108866 | true |
b | 2 | 4 | 67108866 | 67108870 | true |
z | 26 | 67108864 | 67108870 | 67108870 | false |
When checking the second z
, the state
value did not change meaning z
was already encountered, hence the characters in the string are not unique.