Chess Move Generation with Magic Bitboards

“Magic bitboards” are a common technique in computer chess engines for move generation; most modern chess engines use some variant of the technique. While there are plenty of explanations online for how the related techniques of “bitboards” and “rotated bitboards” work, I had a hard time finding an explanation of how and why magic bitboards work. All of them simply lay out the implementation with little explanation, or with explanation steeped in detailed chess programming jargon, inappropriate for a newcomer.

So here’s my attempt at an introduction, geared towards people with an existing background in software engineering, but not chess programming. This is not an in-depth discussion; there is not enough information here to go and implement anything, but hopefully there is enough information to understand what you’re seeing if you go do further research. I am, myself, in no way an expert on this topic, which hopefully means I stand more of a chance of explaining the basics to beginners!

If you want another, somewhat more rigorous, attempt at explaining this, I recommend Pradyumna Kannan’s paper on the subject. Repeated re-readings of that paper is what finally made this all fit together for me.

Rotated Bitboards

In order to understand magic bitboards, we first need to understand rotated bitboards.

Modern desktop computers have 64-bit registers, and conveniently, there are 64 squares on a chess board, so we can use this to our advantage to work with chess boards extremely efficiently.

We can think of each bit in a 64-bit integer as representing a single square on a chess board. The first bit is A1, the eighth bit is H1, the ninth bit is A2, all the way up to the sixty-fourth bit being H8. We use a series of these 64-bit integers, one for each piece of each colour, to represent a full state of a chess board; we set a bit in the integer to 1 if there is a piece of that colour on that square, and set it to 0 otherwise. (The “first bit” can be either the MSB or LSB; both work, and the distinction won’t matter for this overview.)

These 64-bit integers are called “bitboards”.

For example, the bitboard representing white knights in the starting position is this 64-bit integer, represented in binary (into which I’ve inserted suggestive line breaks):

00000000
00000000
00000000
00000000
00000000
00000000
00000000
01000010

Or, since 0 and 1 get visually noisy, I’ll represent boards like the above like this — remember that despite the notation these are all just 64-bit integers:

........
........
........
........
........
........
........
.1....1.

Okay, we can represent a chess board compactly — a single 64-bit word, something a CPU can process quite efficiently. How can we take advantage of this for move generation?

In order to generate all moves that, say, white can make, we can loop over each bitboard for each white piece. We can then loop over each bit in each of those bitboards (something, again, modern CPUs can do quite efficiently for 64-bit integers), generating moves for just that one specific piece. After all this is done, we have all moves for all pieces for white. In our example, generating moves for white at the beginning of the game, we’d thus loop over white’s pawn bitboard, bishop bitboard, etc, looping over every piece in those boards, collecting up all possible moves; we’d eventually get to white’s knight bitboard, pictured above, and loop over the two set bits, representing the knights on B1 and G1.

Since we can efficiently loop over all the pieces, what we need is an efficient way to generate moves for a specific piece on a specific square. (“Give me all the moves for a knight on B1” in our example.) For knights, this is pretty easy: their available moves are quite unaffected by other things on the board — even if there is a piece where they want to move, they can just capture it, so that square is still legal for them to move to. Thus, we can use a simple, precomputed lookup table that takes a square and gives us another bitboard representing all possible locations the knight could end up. We then, again, loop over all set bits in that bitboard, giving us all possible destinations for the knight, thus all possible moves for the knight.

In our example, the result of this lookup table for a knight on B1 is a bitboard which looks like this:

........
........
........
........
........
1.1.....
...1....
........

Which represents that a knight on B1 could end up on A3, C3, or D2.

There are some fiddly details here — we need to make sure we don’t capture our own pieces, leave our king in check, things like that — but this is how the basic technique works. And in fact, the basic technique works pretty similarly for kings and pawns, too. We need to generate all these lookup tables, but we can do it once during program startup (or even at compile time), so that doesn’t affect performance of move generation.

But it doesn’t work for the sliding pieces: rooks, bishops, and queens. Their available moves depend heavily on where other pieces are on the board. We can’t use a simple lookup table based just on the piece’s location — saying “there is a rook on D4” is not enough information to tell you where that rook can move to (unlike with a knight).

We can still use a precomputed lookup table, but we have to make it a little more sophisticated. Consider our rook on D4. Let’s start by figuring out how we can compute its horizontal movements along the fourth rank. The key observation is that only pieces on the fourth rank affect how a rook moves along the fourth rank (and that we don’t care what those pieces are, only where they are). We can easily track this information by maintaining another bitboard, a full composite of all pieces of all colours on the board, and then extract the information about the fourth rank via a simple bitshift and bitmask. We can then use this as an additional index into our lookup table, i.e., we index our lookup table by both the piece’s position as well as by the 8-bit bitmask for the occupied squares on the current rank.

For example, let’s suppose again we have a rook on D4, and that the full composite bitboard, of every piece of every colour, looks like this:

......1.
.....111
.111....
........
...1.1..
........
11....11
......1.

Most of that doesn’t actually matter — we only care about the bits representing the fourth rank, which we shift and mask down to the following occupancy:

...1.1..

We then access the precomputed rook move table, for square D4, for that 8-bit occupancy, and get back:

111.11..

Which means that the rook can move to A4, B4, C4, E4, or F4, but not G4 or H4 (because it’s blocked by the piece on F4).

Again, there are fiddly details, and we have to generate these lookup tables — which are 256 times larger than the previous tables — but the basic idea works.

That all works nicely for horizontal movements along a rank, but what about vertical movements along a file? We can actually re-use exactly this trick if we additionally keep a 90-degree rotated version of the composite bitboard, which is a bit of extra bookkeeping but not complicated. Now each file is a contiguous 8 bits and can easily be extracted. And, it turns out, we can use a similar trick for bishops too, with 45-degree and 135-degree rotated composite bitboards! Again lots of fiddly details — it may not be entirely obvious just how a “45-degree rotated board” works — but hopefully this is enough to see that this can be made to work.

Unsurprisingly, this technique is called “rotated bitboards”.

Magic Bitboards

Rotated bitboards are nice, but it’s kind of unfortunate that we have to track the, well, rotated versions of the board. And the shift-and-mask-and-lookup gymnastics get a bit odd, especially for 45-degree and 135-degree rotated boards. And we have to do all of this for each direction a sliding piece can move (twice for rooks and bishops, four times for queens). Can’t we just do a table lookup directly, without all of these shifting and masking and rotated board shenanigans?

What if we did something really naive? For each square, we could have a table which took in the entire 64-bit composite occupancy bitboard and returned the possible moves. Unfortunately, the space requirement would be gigantic (tens of thousands of petabytes), so we can’t do that exactly. But with a couple of key observations, we can get this down to something tractable.

The first key observation is the same as we originally started with in rotated bitboards: for our rook on D4, only pieces on the D file and on the fourth rank matter. We can mask the other bits to zero, dramatically decreasing the number of possible inputs we’ll have to deal with. For example, for our rook on D4 with the same composite occupancy as earlier:

......1.
.....111
.111....
........
...1.1..
........
11....11
......1.

We don’t care about anything outside the D file and fourth rank, so we can mask that down to:

........
........
...1....
........
...1.1..
........
........
........

The second key observation is that, for many inputs, the output is identical. For example, the available moves for our rook on D4 is the same for the above as for the below:

........
...1....
...1....
........
...1.1.1
........
........
........

Those extra pieces on H4 and D7 don’t matter, because our rook is already blocked by the pieces on F4 and D6. The output — the moves our rook on D4 can take — is the same whether they are there or not.

These two combine into the core idea of magic bitboards: what if, instead of a direct lookup table, we use something more like a hash table? Instead of a 64-bit index for the direct lookup, we’ll only need a few bits, obtained by hashing our (masked-down) input occupancy bitboard. Furthermore, we can use a fast, shitty hash function with tons of hash collisions, as long as it only collides on inputs which map to the same output — which, given the constrained possibilities for inputs, and the high number of inputs which map to the same output anyway, is not that onerous of a requirement.

That’s how magic bitboards work: they map a (masked-down) input board to an output set of moves, via a hash table, which uses a shitty-but-fast hash function which takes advantage of hash collisions to reduce the size of the lookup table.

function rook_moves(composite_board, square) {
  masked_composite = composite_board & rook_masks[square];
  hash_key = shitty_hash(masked_composite, square);
  return rook_table[square][hash_key];
}

We’ll need two similar functions and sets of tables, one for rooks and one for bishops. (Queens just combine the two.)

For the shitty hash function, we use a function which multiplies by a magic number, followed by a right shift. The idea is that the magic number “mixes” the bits of the input in a quasi-random way, but biases them towards the high bits (because the number gets larger due to the multiplication), so we shift the “interesting” parts back down.

function shitty_hash(masked_composite, square) {
  return (masked_composite * magics[square]) >> shifts[square];
}

Picking different magic numbers just shuffle around the bits differently, letting us find different combinations of hash collisions — remember we need to make sure that input boards which map to different output moves don’t collide. Picking different shifts lets us trade off the size of the hash table with how “good” our magic number needs to be: with a bigger shift, there are fewer output bits, resulting in a smaller final table — but because there are fewer output bits for the same number of inputs, there are more hash collisions, and so it’s harder to find a magic number which avoids unsuitable collisions.

Finally, that leaves finding suitable magic numbers and shifts. Which is done… by brute force guess-and-check. We can write a “magic number finder” program which picks a random magic number and a large shift, computes all the inputs and outputs, and see if we end up with any unsuitable hash collisions (i.e., two inputs hash to the same value but should have different outputs). If that happens, it decreases the shift and tries again; if the shift gets too small, it picks another random magic and a large shift, etc. We then hardcode the “best” magic numbers (largest shifts) into our main chess engine.

Pradyumna Kannan has started this work for us; magicmoves.zip contains a suitable table of magic numbers and shifts, as well as some supporting code to make use of them. While there may be better numbers — and even improved variations of this basic technique — today, those tables are simple and good enough to start with.

Many explanations jump straight to this multiply-shift-lookup line, complete with magic numbers. Hopefully this gives you some idea of why this works, where the magic numbers come from, etc.

A fun parting fact: Stockfish, one of the strongest chess engines in the world, uses a variation of magic bitboards for its move generation, but if you look in its source code, you won’t find tables of magic numbers as described above. Instead, it runs the “brute-force search for suitable magic” algorithm every time at program startup and encodes a specific PRNG value from which to start, knowing it will quickly converge on a table of suitable numbers!