Comparison of human scores and “human scores” in Atari

Below is a list of the different scores reported for humans on Atari games in the reinforcement learning community and the highest scores reported in TwinGalaxies. The conditions under which these corresponding scores are achieved are different: the RL community tests humans on an emulator using the keyboard, the game can only last up to 5 minutes and there’s no sound (ie they replicate the conditions that the algorithm experiences). It also reports the average, not the maximum values obtained. On the other hand, TwinGalaxies is a site that collects records, so it is only interested in maximum values. The differences in the scores tends to be very big, often by more than 100 times.

I think the maximum values achieved are important benchmarks reflecting the levels that can be achieved by a human brain finely trained on a task. It’s also quite apparent that, even though the reported records are maxima, the average score for the people who achieved these records would not be anywhere near as low as the averages reported for humans in the RL literature. The five minute mark will become a problem in trying to beat these top scores, but that time limit can be removed. It doesn’t tend to be a bottleneck in most cases at the moment since the agents can rarely play for five minutes.

In the following table, “DNADRL human” refers to the human scores reported in “Dueling Network Architectures for Deep Reinforcement Learning” by Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot and Nando de Freitas https://arxiv.org/abs/1511.06581. Similar scores are seen in other papers in the literature. TwinGalaxies (any) is the highest score for that game with difficulty setting B achieved in either a console or an emulator, although they almost completely consist of console scores. Factor is just the TwinGalaxies score divided by the DNADRL score.

Game DNADRL human TwinGalaxies (any) Factor Link
Alien 7,127.7 255,265 35.8 http://www.twingalaxies.com/scores.php?scores=2316
Amidar 1,719.5 104,159 60.6 http://www.twingalaxies.com/showthread.php/140175
Assault 742.0 8,647 11.7 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Assault
Asterix 8,503.3 1,000,000 117.6 http://www.twingalaxies.com/showthread.php/150583
Asteroids 47,388.7 10,004,100 211.1 http://www.twingalaxies.com/scores.php?scores=2315
Atlantis 29,028.1 10,604,840 365.3 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Atlantis
Bank Heist 753.1 82,058 109.0 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Bank%20Heist
Battle Zone 37,187.5 1,545,000 41.5 http://www.twingalaxies.com/scores.php?scores=2737
Beam Rider 16,926.5 999,999 59.1 http://www.twingalaxies.com/scores.php?scores=2738
Berzerk 2,630.4 1,057,940 402.2 http://www.twingalaxies.com/showthread.php/149853
Bowling 160.7 300 1.9 http://www.twingalaxies.com/showthread.php/159727
Boxing 12.1 01:42.0 http://www.twingalaxies.com/scores.php?scores=740″
(time remaining after getting to 100 is the measurement of choice)
Breakout 30.5 864 28.3 http://www.twingalaxies.com/scores.php?scores=5389
(864 is the maximum score)
Centipede 12,017.0 1,301,709 108.3 http://www.twingalaxies.com/showthread.php/167433
Chopper Command 7,387.8 999,999 135.4 http://www.twingalaxies.com/showthread.php/140559
Crazy Climber 35,829.4 219,900 6.1 http://www.twingalaxies.com/showthread.php/167082
Defender 18,688.9 5,443,150 291.3 http://www.twingalaxies.com/scores.php?scores=2296
Demon Attack 1,971.0 1,556,345 789.6 http://www.twingalaxies.com/scores.php?scores=2295
Double Dunk -16.4
Enduro 860.5 118,000.00 km 137.1 http://www.twingalaxies.com/scores.php?scores=2106
(Note: different scoring, and top score is 30 times the second!)
Fishing Derby -38.7 71 -1.8 http://www.twingalaxies.com/scores.php?scores=13743
Freeway 29.6 38 1.3 http://www.twingalaxies.com/showthread.php/145169
Frostbite 4,334.7 1,832,730 422.8 http://www.twingalaxies.com/scores.php?scores=3770
Gopher 2,412.5 829,440 343.8 http://www.twingalaxies.com/scores.php?scores=3775
Gravitar 3,351.4 999,950 298.4 http://www.twingalaxies.com/scores.php?scores=3777
H.E.R.O. 30,826.4 1,000,000 32.4 http://www.twingalaxies.com/showthread.php/157083
Ice Hockey 0.9 36 40.0 http://www.twingalaxies.com/showthread.php/167393
James Bond 302.8 45,550 150.4 http://www.twingalaxies.com/scores.php?scores=13897
Kangaroo 3,035.0 1,424,600 469.4 http://www.twingalaxies.com/scores.php?scores=3788
Krull 2,665.5 1,245,900 467.4 http://www.twingalaxies.com/scores.php?scores=3793
Kung-Fu Master 22,736.3 1,000,000 44.0 http://www.twingalaxies.com/scores.php?scores=3794
Montezuma’s Revenge 4,753.3 1,219,200 256.5 http://www.twingalaxies.com/scores.php?scores=3808
Ms. Pac-Man 6,951.6 2,654,680 381.9 http://www.twingalaxies.com/scores.php?scores=3816
Name This Game 8,049.0 25,220 3.1 http://www.twingalaxies.com/scores.php?scores=3817
Phoenix 7,242.6 4,014,440 554.3 http://www.twingalaxies.com/scores.php?scores=3829
Pitfall! 6,463.7 114,000 17.6 http://www.twingalaxies.com/showthread.php/146051
Pong 14.6 Understandably, no records are kept
Private Eye 69,571.3 118,000 1.7 Game on difficulty A is what is tracked
Q*Bert 13,455.0 2,400,000 178.4 http://www.twingalaxies.com/showthread.php/164665
River Raid 17,118.0 1,000,000 58.4 http://www.twingalaxies.com/scores.php?scores=2133
Road Runner 7,845.0 2,038,100 259.8 http://www.twingalaxies.com/scores.php?scores=5152
Robotank 11.9 76 6.4 http://www.twingalaxies.com/scores.php?scores=5270
Seaquest 42,054.7 999,999 23.8 http://www.twingalaxies.com/scores.php?scores=2136
Skiing -4,336.9 -3272 0.8 http://www.twingalaxies.com/showthread.php/141803
I think I’m translating the score right
Solaris 12,326.7 57,840 4.7 http://www.twingalaxies.com/showthread.php/140250
Space Invaders 1,668.7 621,535 372.5 http://www.twingalaxies.com/scores.php?scores=2172
Stargunner 10,250.0 69,400 6.8 http://www.twingalaxies.com/scores.php?scores=13927
Surround 6.5
Tennis -8.3
Time Pilot 5,229.2 112,100 21.4 http://www.twingalaxies.com/scores.php?scores=2530
Tutankham 167.6
Up and Down 11,693.2 75,230 6.4 http://www.twingalaxies.com/scores.php?platformid=5&gamename=Up%20%27N%20Down
Venture 1,187.5 913,200 769.0 http://www.twingalaxies.com/scores.php?scores=2223
Video Pinball 17,667.9 35,197,952 1992.2 http://www.twingalaxies.com/scores.php?scores=3032
Wizard Of Wor 4,756.5 864,500 181.8 http://www.twingalaxies.com/scores.php?scores=3035
Yars’ Revenge 54,576.9 15,000,105 274.8 http://www.twingalaxies.com/showthread.php/161781
(maybe different game)
Zaxxon 9,173.3 772,400 84.2 http://www.twingalaxies.com/scores.php?scores=3039

DQN learns video pinball

Like Atlantis, Video pinball is another game where DQN and its derivatives do very well compared to humans, scoring 10s of times higher than us. I looked at the game and trained an agent using Nathan Sprague’s code, which you can get from https://github.com/spragunr/deep_q_rl, to see what was happening.

In summary: patience was what was happening. The secret to the game, or at least how DQN manages to score highly, is by very frequent and well timed bumping, putting and keeping the ball in a vertical bounce pattern across one of the two point-gathering zone that you can see surrounded by the vertical bars at the left and right near the top of the screen, like this:

It’s not easy to tell but there’s a lot of nudging of the ball going on while it’s in flight.

Here’s the usual learning graph:

On the left is the average score per epoch, on the right the maximum score per epoch. For reference, in the DeepMind papers, they tend to list human scores of around 20,000. As noted at the bottom top human scores are actually far above this.

The progress video is not very interesting. The untrained network doesn’t know how to launch the ball, so there’s nothing to see. It then learns to launch the ball and perform almost constant but undirected bumps, which get it to around 20,000 points (around “human level”), before discovering its trick, after which the scores take off. There’s probably more noticeable progress along the line related to recovery techniques in the cases where the ball escapes, but it doesn’t feel worth trawling through hours of video to discover them.

In summary, like in the case of Atlantis, I think this game is “solved”, and it shouldn’t be used for algorithm comparison in a quantitative way. It could just be a binary “does it discover the vertical trapping technique?” like Atlantis is about “does it discover that shooting those low fast-flying planes is all important?” and that’s all. The differences in score are probably more to do with the random seed used during the test run.

Unlike in Atlantis, these scores aren’t actually superhuman. TwinGalaxies reports a top score of around 38  million. You can watch this run in its full 5-hour glory here: http://www.twingalaxies.com/showthread.php/173072. This is still far, far above whatever DQN gets.

I’ve uploaded the best-performing network to http://organicrobot.com/deepqrl/video_pinball_20170129_e41_doubleq_dc48a1bab611c32fa7c78f098554f3b83fb5bb86.pkl.gz. You’ll need to get Theano from around January 2017 to run it (say commit 51ac3abd047e5ee4630fa6749f499064266ee164) since I trained this back then and they’ve changed their format since then.  I think I used the double Q algorithm, but it doesn’t make much difference whether you use that or the standard DQN I imagine.