Master Atari, Go, chess and shogi planning with a learned model

Campbell, M., Hoane, AJ Jr & Hsu, F.-h. Deep blue. Artif. Intell. 134, 57-83 (2002).

Google Scholar article

two

Silver, D. et al. Master the Go game with deep neural networks and tree research. Nature 529, 484–489 (2016).

Google Scholar CAS ADS Article

3 –

Bellemare, MG, Naddaf, Y., Veness, J. & Bowling, M. The arcade learning environment: an assessment platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).

Google Scholar article

4 –

Machado, M. et al. Revisiting the arcade learning environment: assessment protocols and open problems for agents in general. J. Artif. Intell. Res. 61, 523–562 (2018).

MathSciNet Google Scholar Article

Silver, D. et al. A general reinforcement learning algorithm that dominates chess, shogi and automatic play. Science 362, 1140–1144 (2018).

MathSciNet CAS Google Scholar ADS Article

Schaeffer, J. et al. A world championship caliber checkers program. Artif. Intell. 53, 273-289 (1992).

Google Scholar article

Brown, N. & Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats the best professionals. Science 359, 418–424 (2018).

MathSciNet CAS Google Scholar ADS Article

Moravčík, M. et al. Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356, 508-513 (2017).

MathSciNet Google Scholar ADS Article

Vlahavas, I. & Refanidis, I. Planning and Programming Technical Report (EETN, 2013).

Segler, MH, Preuss, M. & Waller, MP Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

Google Scholar CAS ADS Article

Sutton, RS and Barto, AG Reinforcement Learning: An Introduction 2nd ed. (MIT Press, 2018).

Deisenroth, M. & Rasmussen, C. PILCO: a model-based and data-efficient approach to policy research. Inside Proc. 28th International Conference on Machine Learning, ICML 2011 465–472 (Omnipress, 2011).

Heess, N. et al. Learning of continuous control policies by gradients of stochastic value. Inside NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Vol. 2 (eds Cortes, C. et al.) 2944–2952 (MIT Press, 2015).

Levine, S. & Abbeel, P. Learning neural network policies with guided policy research under unknown dynamics. Adv. Neural Inf. Process. Syst. 27, 1071–1079 (2014).

Google Scholar

Hafner, D. et al. Learning latent dynamics for pixel planning. Prepress at https://arxiv.org/abs/1811.04551 (2018).

Kaiser, L. et al. Model-based reinforcement learning for atari. Prepress at https://arxiv.org/abs/1903.00374 (2019).

Buesing, L. et al. Learning and consultation of rapid generating models for reinforcement learning. Prepress at https://arxiv.org/abs/1802.03006 (2018).

Espeholt, L. et al. IMPALA: Scalable, distributed deep RL with weighted actor-student architectures. Inside Proc. International Machine Learning Conference, ICML Vol. 80 (eds Dy, J. & Krause, A.) 1407-1416 (2018).

Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J. & Munos, R. Repetition of the recurrent experience in distributed reinforcement learning. Inside International Conference on Learning Representations (2019).

Horgan, D. et al. Repetition of distributed prioritized experience. Inside International Conference on Learning Representations (2018).

Puterman, ML Markov decision processes: Discrete Stochastic Dynamic Programming 1st ed. (John Wiley & Sons, 1994).

Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree research. Inside International Conference on Computers and Games 72–83 (Springer, 2006).

Wahlström, N., Schön, TB & Deisenroth, MP From pixels to torques: policy learning with deep dynamic models. Prepress at http://arxiv.org/abs/1502.02251 (2015).

Watter, M., Springenberg, JT, Boedecker, J. & Riedmiller, M. Embed to control: a locally linear latent dynamic model for control from raw images. Inside NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Vol. 2 (eds Cortes, C. et al.) 2746–2754 (MIT Press, 2015).

Ha, D. & Schmidhuber, J. Recurring world models facilitate policy evolution. Inside NIPS’18: Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S. et al.) 2455–2467 (Curran Associates, 2018).

Gelada, C., Kumar, S., Buckman, J., Nachum, O. & Bellemare, MG DeepMDP: learning continuous latent space models for learning representation. Proc. 36th International Conference on Machine Learning: Volume 97 of Proc. Machine learning research (eds Chaudhuri, K. & Salakhutdinov, R.) 2170–2179 (PMLR, 2019).

van Hasselt, H., Hessel, M. & Aslanides, J. When to use parametric models in reinforcement learning? Prepress at https://arxiv.org/abs/1906.05243 (2019).

Tamar, A., Wu, Y., Thomas, G., Levine, S. & Abbeel, P. Value iteration networks. Adv. Neural Inf. Process. Syst. 29, 2154-2162 (2016).

Google Scholar

Silver, D. et al. The predictron: learning and planning from end to end. Inside Proc. 34th International Conference on Machine Learning Vol. 70 (eds Precup, D. & Teh, YW) 3191–3199 (JMLR, 2017).

Farahmand, AM, Barreto, A. & Nikovski, D. Value-aware loss function for model-based reinforcement learning. Inside Proc. 20th International Conference on Artificial Intelligence and Statistics: Volume 54 of Proc. Machine learning research (eds Singh, A. & Zhu, J) 1486–1494 (PMLR, 2017).

Farahmand, A. Iterative value-aware model learning. Adv. Neural Inf. Process. Syst. 31, 9090–9101 (2018).

Google Scholar

Farquhar, G., Rocktaeschel, T., Igl, M. & Whiteson, S. TreeQN and ATreeC: differentiable tree planning for deep reinforcement learning. Inside International Conference on Learning Representations (2018).

Oh, J., Singh, S. & Lee, H. Value prediction network. Adv. Neural Inf. Process. Syst. 30, 6118–6128 (2017).

Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, GE Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097-1105 (2012).

Google Scholar

He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. Inside 14th European Conference on Computer Vision 630–645 (2016).

Hessel, M. et al. Rainbow: combining improvements in deep reinforcement learning. Inside Thirty-second AAAI Conference on Artificial Intelligence (2018).

Schmitt, S., Hessel, M. & Simonyan, K. Non-political actor-critic with shared replay experience. Prepress at https://arxiv.org/abs/1909.11583 (2019).

Azizzadenesheli, K. et al. Surprising negative results for the research of the generative adversary tree. Prepress at http://arxiv.org/abs/1806.05780 (2018).

Mnih, V. et al. Control at the human level through deep reinforcement learning. Nature 518, 529-533 (2015).

Google Scholar CAS ADS Article

Open, AI OpenAI five. OpenAI https://blog.openai.com/openai-five/ (2018).

Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

Google Scholar CAS ADS Article

Jaderberg, M. et al. Reinforcement learning with unsupervised auxiliary tasks. Prepress at https://arxiv.org/abs/1611.05397 (2016).

Silver, D. et al. Mastering the Go game without human knowledge. Nature 550, 354–359 (2017).

Google Scholar CAS ADS Article

Kocsis, L. & Szepesvári, C. Monte-Carlo planning based on bandits. Inside European Machine Learning Conference 282–293 (Springer, 2006).

Rosin, CD Multi-armed bandits with context of the episode. Ann. Mathematics. Artif. Intell. 61, 203–230 (2011).

MathSciNet Google Scholar Article

Schadd, MP, Winands, MH, Van Den Herik, HJ, Chaslot, GM-B. & Uiterwijk, JW Monte-Carlo tree research for one player. Inside International Conference on Computers and Games 1–12 (Springer, 2008).

Pohlen, T. et al. Observe and observe more: achieving consistent performance on Atari. Prepress at https://arxiv.org/abs/1805.11593 (2018).

Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized experience replay. Inside International Conference on Learning Representations (2016).

49.

Cloud TPU. Google Cloud https://cloud.google.com/tpu/ (2019).

Coulom, R. Classification of the whole story: a Bayesian classification system for players of variable strength over time. Inside International Conference on Computers and Games 113–124 (2008).

Nair, A. et al. Massively parallel methods for deep reinforcement learning. Prepress at https://arxiv.org/abs/1507.04296 (2015).

Lanctot, M. et al. OpenSpiel: a framework for reinforcement learning in games. Prepress at http://arxiv.org/abs/1908.09453 (2019).

Source

Share this:

Related