Solving NP-hard puzzles with the oldest trick in the book

408 points2
dietrichepp23 minutes ago

Nice article! I used similar techniques to find solutions for a game called “DStar”… including the choice to write the solver in Rust. You can play the game and see the optimal solutions on the web (not tested on mobile):

A state in this game is:

  enum Active {

  struct State {
    coins: u32,     // bitmask of which coins remain
    active: Active, // which player is active
    ball: Point,    // location of ball
    block: Point,   // location of block
I thought about writing an A* solver for this, but a simple BFS found all the solutions quickly enough. With a single-threaded solver, each level could be solved in 40s or less. The longest solutions are around 100 moves long.
sriram_malhar11 hours ago

Very nicely written. In addition, the example chosen was itself lovely to play with.

Explicit-state model checkers do this at scale. Readers may be interested in the internals of the TLA+ model checker, esp. the encoding of the state and dealing with disk.

Model Checking TLA+ Specifications

by Yuan Yu, Panagiotis Manolios, and Leslie Lamport (1999)

robinhouston47 minutes ago

This is an interesting puzzle. It not obviously in NP. My guess would be that it’s PSPACE-complete. Have you thought about the complexity at all?

taintegral40 minutes ago

I gave the thought some idle time, but it's been so long since I've constructed a proper hardness proof. If I do, I'll definitely make a post about it!

myaccount809 hours ago

Very well written. Unfortunately I already tried all these kind of tricks in some competitions but others were still able to beat my code. Are there additional resources for improving it even further?

Radim8 hours ago

Cache + avoid dynamic allocations like the plague. For example, high-perf search algos implement reversible transitions: instead of creating & pushing new states around all the time, modify just one state, in-place. And then apply the same transformation in reverse when backtracking.

If you design your data structures well – to reflect the required transitions and query operations, rather than what the problem looks like to a human when drawn on a piece of paper – the forward/backward transition is nearly a no-op. Just some binary bit fiddling over data that's already in a CPU cache. And there's NO DYNAMIC ALLOCATIONS at all. Your search will fly!

The OP also mentions another great "caching" technique, under "Entropy reduction". This is really hard but basically try to find symmetries in the search space which allow you to prune away entire subspaces apriori, without searching at all. Often it'll be something like "rotating by 90 degrees leads to the same position", mirror positions, invariance to color, time… the symmetry types and their runtime benefits are problem-specific, so you need to sit and think hard about what makes a solution unique.

In the limit, you may be lucky enough to prune away so much of the solution space that there's only a single state left. Congratulations: you've solved the problem analytically :)

twanvl3 hours ago

Depending on the type of puzzle, it might be possible to work backwards. For example here, if you can compute reverse-transitions you could run one or two steps of that, and put all states from which the goal can be reached in two moves into a HashMap. Then you run the forward search and stop as soon as you hit anything in that HashMap. In theory, you can reduce the runtime to find an n-move solution from O(bf^n) to O(bf^{n/2}), at the cost of more memory use. This can also be combined with A*.

Low-level optimization can be worth it:

* You can try to pack the game state into integers and use bitwise operations. An 8×8 board can be stored as a 64 bit vector, so a `u64`. If you know the edges of the board are never occupied, then moving around can be as simple as a bit shift (probably not for this game).

* A smaller state representation also means that HashMap lookups will be faster.

* Instead of using a pair of integers to represent a position, use a single integer and save a multiplication for every lookup into a grid.

* Add a ring of impassible cells around the board, instead of checking for the edges of the board each time.

taintegral2 hours ago

This is a great idea, and the low-level optimization tips are all excellent ones I have used in the past. I want to talk a little bit more about using bidirectional A* though, because I think it's very interesting. It's a great strategy in general, but this may be a case where it doesn't do as well.

Working backwards for this particular puzzle is very difficult because on each turn an actor may or may not move. This effectively increases the branching factor from 4 (one for each direction) to 4 * 2^n (for each of four directions, each actor may or may not have moved). In practice it would be lower than that upper bound, but it could still be significantly higher than the forward branching factor. A nice visualization for this to think of your start and end states as points in space, and your A* searches as cones emitting from one point and growing toward the other. The angle of the cone would be roughly approximate of your branching factor, and when your cones meet each other or a point the search is done. If your branching factor is the same forwards and backwards, you can travel through much less space by searching forwards and backwards simultaneously. However, if your backwards branching factor is higher then the cone from the end state will be much broader. This could travel through much more space than just doing a forward search.

This kind of behavior is very evocative one-way functions, and makes me think it might be related to NP-hardness in some way. I'm really not qualified to prove these kinds of statements though. Maybe someone else can offer a more rigorous mathematical perspective?

hairtuq47 minutes ago

For the quite similar puzzle Atomix, it also seems like the branching factor would be much higher for backward search because upper bounds are weaker, but you can show that on average the branching factor is actually the same [1]. I wonder if the same argument would work here.

[1] Section 5.5

sam0x177 hours ago

Haven't looked at this particular game closely but had a similar experience to OP with an Othello AI competition back in college. There were four killer features for me that allowed me to win (and to apparently keep winning for years after I left, according to my prof)

1. having a really good and creative heuristic. The one I used ended up taking 6 different ideas I had for heuristics and combining them together in a weighted average based on their performance in a randomized trial I conducted between the 6. My vague recollection is that slightly over-valuing 4-corners positions performs unexpectedly well in Othello, but there was a lot more to it than that. The actual effectiveness of various heuristics changes over time as the game goes on, though I never modeled or attempted to exploit this.

2. Knowing the exact memory and execution time bounds on my prof's machine and setting things up so that I can terminate exactly when the time is ~5ms away from running out. We were limited to exactly 1 second per turn.

3. Caching. This was especially important in my case since I was technically using 6 different heuristics. I actually pre-generated a cache of the 100 most popular gamestates I encountered during my randomized trials, and this vastly increased the average depth I was able to explore in the allotted calculation time for one turn (1 second), especially during early game.

4. This is a continuation of 3, but it's super important if you have a turn based game with execution time limits to not throw away your work between turns. If you can modify your search so that it is pausible / resumable (which you can do with some rather simple multi-threading), and then define a simple routine that lets you resume a previous search by quickly modifying the tree and then resuming instead of starting an entirely new one, you are going to explore much much more. This optimization even with a crappy heuristic is going to win 99% of the time against opponents who don't use it.

One thing I didn't explore but wish I had was trying to predict which heuristic in my library of heuristics is closest to that of my opponent, and then opting for a strategy that is most likely to beat that heuristic. This would look something like you calculate each turn what the most likely opponent heuristic is based on their moves so far, and then have a pre-computed table of each heuristic's "foil". Maybe this would only kick in after several turns. An even better version of this would probably be to just use the probabilities for each heuristic as the weighted importance of each respective foil, and use all the foils together in a weighted average.

Fun fact: this was all in Java at the time. I can only imagine what havoc one could wreck with this sort of approach in Rust.

maxwells-daemon5 hours ago

Caltech, right? Were you the author of Flippy? There's a new generation of AlphaZero-style ML bots that's managed to finally dethrone it :)

I've been working on and off on a Rust Othello bot aiming to combine AlphaZero in the midgame with a fast endgame solver [1]. Probably the coolest feature that's currently finished is that valid moves are generated and executed with SIMD instructions, so searching a new position only takes a few clocks on a modern cpu.


saagarjha7 hours ago

Java is generally not too bad for competitive programming. Of course C++ will easily beat it, but generally only by a factor of 2x or so. Unlike Python, which can easily be 5-10x worse…

matsemann6 hours ago

My experience from these kind o tournaments as well. The order of magnitude difference meant that someone using java/c++ could search one or two moves deeper than those usin python, winning even with suboptimal implementations/heuristics.

taintegral8 hours ago

In my experience, the only way to make meaningful progress on performance from here on out is to:

- Squeeze out more entropy (for example, rotating states for symmetric boards)

- Make the heuristic function smarter (for example, by calculating the assignment bottleneck)

I wrote a Carcassonne solver once and found many little optimization opportunities by detecting fail states early for example. Avoiding dead ends saves a massive amount of time.

coolgeek24 hours ago

This is a good article. It deserves a lot more attention than it got.

taintegral21 hours ago

Thank you! As long as some people got to see it, that's enough for me. :)

dang13 hours ago

We'll put it in the second-chance pool (, explained at, so it will get a random placement on HN's front page.

p.s. if you don't mind, could you please put your email address in your profile? That way we can send you a repost invite in the future, which is the way we do this if the post is older than a few days. And even if it's not that old, we sometimes still email a heads-up.

RicoElectrico7 hours ago

The frequency of using that workaround is telling how much valuable content gets under the radar of the HN hivemind. If a submission is 1) not about FAANG, entrepreneurship, programming language du jour, current events, or HN's idols or 2) is posted from the wrong time zone - it's often dead on arrival. I mean, there are counterexamples on the frontpage, but it's only a tip of the iceberg compared to what you can get in /new after filtering all the spam and fluff.

HN implicitly positions itself as "the smarter Reddit" but in my experience most subreddits of value don't have strong time zone bias. Anything that doesn't force me to post in the "SV programmers are slacking off" time window and compete for attention with a bazillion other posts would be welcome.

dang2 hours ago
taintegral10 hours ago

Done! Thanks for the boost, I appreciate it very much. :)

Ntrails9 hours ago

It is a lovely piece, though I had an advantage of knowing ricochet robots already (which I immediately started thinking about solving).

User2312 hours ago

This is the kind of gem that keeps me coming back to HN. Thanks!

veselin10 hours ago

This reminds me of the problems we did 15-20 ago at IOI or ACM ICPC. We did these in pure C then, sometimes C++.

I would have kept the states in a different way. Instead of making a vector/array of actors, I would make a pair of bitvectors the size of the grid. 1 is set if there is a blue (resp. red) actor at that position. No sorting is needed and it seems that for more practical puzzles this gives smaller state. All move operations are still easy to implement.

taintegral10 hours ago

That would definitely work, and I’d be interested in the performance impact. This was written so that the state size would scale with the number of actors rather than the size of the grid. There is a degenerate case where a massive mostly empty grid becomes difficult not only to store in memory, but also to transition on move. The transition function would take time proportional to the size of the grid rather than the number of actors.

lalaland112511 hours ago

I think it would be super neat to also compare this to a generic SAT solver based solution.

cerved6 hours ago

or CP solver

whatever11 hour ago

Or an Integer Programming solver

matsemann6 hours ago

The transitions can often be sped up by not cloning and modifying state, but instead keeping track of all transitions and rollbacking. Not sure if it's doable here, since undoing a LEFT cannot know if the box was already the wall. So might need some extra bookkeeping. But for instance when solving 8-queens, sudoku or similar for huge grids, just walking back up the tree of transitions and undoing and reapplying stuff yields an immense speedup.

Edit: I see Radim mentions the same in a response to someone else.

petters10 hours ago

How is this puzzle NP-hard? Genuine question, because of the number of pieces is bounded, it's solvable with a shortest path in a graph with a polynomial number of nodes.

contravariant4 hours ago

Wouldn't the size of the graph grow something like $n^k / k!$ ? This is not polynomial in $k$.

taintegral10 hours ago

Bounding the maximum number of actors is just an optimization for the cases we want to solve. Of course if you wanted to really solve any case you would need infinite space, and that’s not achievable either. If you desired, you can also just omit that particular optimization. :) I think a better way to phrase it is that we are writing a solver for a reasonable subset of inputs to an NP-hard problem.

grandpa4 hours ago

  git clone --branch start
  Cloning into 'anima_solver'...
  fatal: Remote branch start not found in upstream origin
taintegral3 hours ago

This is fixed now.

antman11 hours ago

Really great write up, some comment in it why in A* we need the heuristic to be the way it is. A* is a generic algorithm but given other problem knowledge i.e. the flat game map, an alternative algorithm selection and testing process would be a great addition. Overall great though.

t3chn0l0g1c6 hours ago

Very nice article. I´ve spent quite some time optimizing a solver for, but due the insanely large search space in the larger puzzles its quite not as easy. Can recommend that for anyone looking for a challenge.