DQN in crazy climber just cares about speed

Continuing on the series of games that DQN and its descendants learn to play well: crazy climber. In this game, you climb a building by moving the joystick up, which lifts your arms, and pulling it back down, which lifts you up. You are grabbing onto the ledge of windows which are open. The windows occasionally close, and if you are holding from their ledge at the time, you fall.

DQN learns to play this game reasonably well in no time at all. This, I think, is because it receives rewards for every step up it climbs. This extremely dense reward suits Q-learning perfectly, and it just learns to climb extremely fast.

The problem is that it seems to get addicted to just climbing fast, and it ignores almost all of the rest of the game. It is so fast that windows very rarely close on it, so in situations where a human would stop and wait, it just runs through them and gets away with it most of the time.

Already by the end of the first epoch it knows to just leg it up quickly, but it doesn’t know to move sideways once it hits the end of a column onto one with more floors to climb, only drifting sideways by random movements, like this:

Learning to move quickly to the open column takes it a couple more epochs.

Crazy climber learning curve

It later seems to learn to hold on to the ledge with both hands when it’s going to get hit by something, like an egg dropped by a bird. If you are not holding on, you fall. It also learns to catch onto the helicopters at the end of each level for bonus points.

What it never really seems to learn is to wait for the windows to reopen, or to move sideways onto a clearer path. This still lets it get to about 100,000 points on average due to the huge bonuses it gets for completing the levels quickly. This is higher than DeepMind’s testers which is about 35,000 points. The world record is a surprisingly-low-to-me 219,000 which is probably due to the cartridge having been hard to find in its console days. (While I was writing this post, the APE-X paper was published claiming a score of around 320,000 in Crazy climber. This would beat both the human record, and it probably means it learnt other tricks)

The usual learning progress video:

You can download a trained version of the network from https://organicrobot.com/deepqrl/crazy_climber_20151018_e90_54374b2c86698bd8c71ca8d5936404340c0bea2d.pkl.gz . It was trained by running run_nature.py.

To use the trained network, you’ll need to get some old versions of the respective code repositories because this was trained back in October 2015 Late November 2015 versions of Nathan Sprague’s deep Q learning code https://github.com/spragunr/deep_q_rl, or commit f8f2d600b45abefc097f6a0e2764edadb24aca53 on my branch  https://github.com/alito/deep_q_rl, Theano https://github.com/Theano/Theano commit 9a811974b934243ed2ef5f9b465cde29a7d9a7c5, Lasagne https://github.com/benanne/Lasagne.git commit b1bcd00fdf62f62aa44859ffdbd72c6c8848475c and cuDNN 5.1 work.

Comparison of human scores and “human scores” in Atari

Below is a list of the different scores reported for humans on Atari games in the reinforcement learning community and the highest scores reported in TwinGalaxies. The conditions under which these corresponding scores are achieved are different: the RL community tests humans on an emulator using the keyboard, the game can only last up to 5 minutes and there’s no sound (ie they replicate the conditions that the algorithm experiences). It also reports the average, not the maximum values obtained. On the other hand, TwinGalaxies is a site that collects records, so it is only interested in maximum values. The differences in the scores tends to be very big, often by more than 100 times.

I think the maximum values achieved are important benchmarks reflecting the levels that can be achieved by a human brain finely trained on a task. It’s also quite apparent that, even though the reported records are maxima, the average score for the people who achieved these records would not be anywhere near as low as the averages reported for humans in the RL literature. The five minute mark will become a problem in trying to beat these top scores, but that time limit can be removed. It doesn’t tend to be a bottleneck in most cases at the moment since the agents can rarely play for five minutes.

In the following table, “DNADRL human” refers to the human scores reported in “Dueling Network Architectures for Deep Reinforcement Learning” by Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot and Nando de Freitas https://arxiv.org/abs/1511.06581. Similar scores are seen in other papers in the literature. TwinGalaxies (any) is the highest score for that game with difficulty setting B achieved in either a console or an emulator, although they almost completely consist of console scores. Factor is just the TwinGalaxies score divided by the DNADRL score.

Game DNADRL human TwinGalaxies (any) Factor Link
Alien 7,127.7 255,265 35.8 http://www.twingalaxies.com/scores.php?scores=2316
Amidar 1,719.5 104,159 60.6 http://www.twingalaxies.com/showthread.php/140175
Assault 742.0 8,647 11.7 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Assault
Asterix 8,503.3 1,000,000 117.6 http://www.twingalaxies.com/showthread.php/150583
Asteroids 47,388.7 10,004,100 211.1 http://www.twingalaxies.com/scores.php?scores=2315
Atlantis 29,028.1 10,604,840 365.3 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Atlantis
Bank Heist 753.1 82,058 109.0 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Bank%20Heist
Battle Zone 37,187.5 1,545,000 41.5 http://www.twingalaxies.com/scores.php?scores=2737
Beam Rider 16,926.5 999,999 59.1 http://www.twingalaxies.com/scores.php?scores=2738
Berzerk 2,630.4 1,057,940 402.2 http://www.twingalaxies.com/showthread.php/149853
Bowling 160.7 300 1.9 http://www.twingalaxies.com/showthread.php/159727
Boxing 12.1 01:42.0 http://www.twingalaxies.com/scores.php?scores=740″
(time remaining after getting to 100 is the measurement of choice)
Breakout 30.5 864 28.3 http://www.twingalaxies.com/scores.php?scores=5389
(864 is the maximum score)
Centipede 12,017.0 1,301,709 108.3 http://www.twingalaxies.com/showthread.php/167433
Chopper Command 7,387.8 999,999 135.4 http://www.twingalaxies.com/showthread.php/140559
Crazy Climber 35,829.4 219,900 6.1 http://www.twingalaxies.com/showthread.php/167082
Defender 18,688.9 5,443,150 291.3 http://www.twingalaxies.com/scores.php?scores=2296
Demon Attack 1,971.0 1,556,345 789.6 http://www.twingalaxies.com/scores.php?scores=2295
Double Dunk -16.4
Enduro 860.5 118,000.00 km 137.1 http://www.twingalaxies.com/scores.php?scores=2106
(Note: different scoring, and top score is 30 times the second!)
Fishing Derby -38.7 71 -1.8 http://www.twingalaxies.com/scores.php?scores=13743
Freeway 29.6 38 1.3 http://www.twingalaxies.com/showthread.php/145169
Frostbite 4,334.7 1,832,730 422.8 http://www.twingalaxies.com/scores.php?scores=3770
Gopher 2,412.5 829,440 343.8 http://www.twingalaxies.com/scores.php?scores=3775
Gravitar 3,351.4 999,950 298.4 http://www.twingalaxies.com/scores.php?scores=3777
H.E.R.O. 30,826.4 1,000,000 32.4 http://www.twingalaxies.com/showthread.php/157083
Ice Hockey 0.9 36 40.0 http://www.twingalaxies.com/showthread.php/167393
James Bond 302.8 45,550 150.4 http://www.twingalaxies.com/scores.php?scores=13897
Kangaroo 3,035.0 1,424,600 469.4 http://www.twingalaxies.com/scores.php?scores=3788
Krull 2,665.5 1,245,900 467.4 http://www.twingalaxies.com/scores.php?scores=3793
Kung-Fu Master 22,736.3 1,000,000 44.0 http://www.twingalaxies.com/scores.php?scores=3794
Montezuma’s Revenge 4,753.3 1,219,200 256.5 http://www.twingalaxies.com/scores.php?scores=3808
Ms. Pac-Man 6,951.6 2,654,680 381.9 http://www.twingalaxies.com/scores.php?scores=3816
Name This Game 8,049.0 25,220 3.1 http://www.twingalaxies.com/scores.php?scores=3817
Phoenix 7,242.6 4,014,440 554.3 http://www.twingalaxies.com/scores.php?scores=3829
Pitfall! 6,463.7 114,000 17.6 http://www.twingalaxies.com/showthread.php/146051
Pong 14.6 Understandably, no records are kept
Private Eye 69,571.3 118,000 1.7 Game on difficulty A is what is tracked
Q*Bert 13,455.0 2,400,000 178.4 http://www.twingalaxies.com/showthread.php/164665
River Raid 17,118.0 1,000,000 58.4 http://www.twingalaxies.com/scores.php?scores=2133
Road Runner 7,845.0 2,038,100 259.8 http://www.twingalaxies.com/scores.php?scores=5152
Robotank 11.9 76 6.4 http://www.twingalaxies.com/scores.php?scores=5270
Seaquest 42,054.7 999,999 23.8 http://www.twingalaxies.com/scores.php?scores=2136
Skiing -4,336.9 -3272 0.8 http://www.twingalaxies.com/showthread.php/141803
I think I’m translating the score right
Solaris 12,326.7 57,840 4.7 http://www.twingalaxies.com/showthread.php/140250
Space Invaders 1,668.7 621,535 372.5 http://www.twingalaxies.com/scores.php?scores=2172
Stargunner 10,250.0 69,400 6.8 http://www.twingalaxies.com/scores.php?scores=13927
Surround 6.5
Tennis -8.3
Time Pilot 5,229.2 112,100 21.4 http://www.twingalaxies.com/scores.php?scores=2530
Tutankham 167.6
Up and Down 11,693.2 75,230 6.4 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Up%20%27N%20Down
Venture 1,187.5 913,200 769.0 http://www.twingalaxies.com/scores.php?scores=2223
Video Pinball 17,667.9 35,197,952 1992.2 http://www.twingalaxies.com/scores.php?scores=3032
Wizard Of Wor 4,756.5 864,500 181.8 http://www.twingalaxies.com/scores.php?scores=3035
Yars’ Revenge 54,576.9 15,000,105 274.8 http://www.twingalaxies.com/showthread.php/161781
(maybe different game)
Zaxxon 9,173.3 772,400 84.2 http://www.twingalaxies.com/scores.php?scores=3039