The problem with this approach is that most real-world situations, and even some games, do not have a simple set of rules that govern how they work. Therefore, some researchers have tried to work around the problem using an approach that tries to model how a given game or scenario environment will affect an outcome, and then use that knowledge to make a plan. The disadvantage of this system is that some domains are so complex that modeling all aspects is almost impossible. This proved to be the case with most Atari games, for example.
In a way, MuZero combines the best of both worlds. Instead of modeling everything, he just tries to consider the factors that are important for decision making. As DeepMind points out, this is something you do as a human being. When most people look out the window and see dark clouds forming on the horizon, they generally don’t care about things like condensation and pressure fronts. Instead, they think about how they should dress to stay dry if they leave the house. MuZero does something similar.
It takes three factors into account when making a decision. He will consider the outcome of his previous decision, his current position and the best course of action to be taken next. This seemingly simple approach makes MuZero the most effective DeepMind algorithm ever made. In his tests, he found that MuZero was as good as AlphaZero in chess, Go and shogi, and better than all his previous algorithms, including Agent57, in Atari games. He also found that the more time he gave MuZero to consider an action, the better it performed. DeepMind also conducted tests in which it placed a limit on the number of simulations that MuZero could complete before committing to a change Mrs. Pac-Man. In those tests, he found that MuZero was still capable of getting good results.
Getting high scores on Atari games is great, but what about the practical applications of DeepMind’s latest research? In a word, they can be innovative. Although we are not there yet, MuZero is the closest that researchers have come to developing a general purpose algorithm. The subsidiary says that MuZero’s learning resources may one day help it tackle complex problems in areas such as robotics, where there are no direct rules.