The SparkDev AI.
Author: Emtanis Shoukry
What are we doing?
The goal of the Pacman and Ghosts teams is to use Reinforcement Learning (RI) to teach our agents (Pacman and each of the Ghosts) how to play the game properly. We'll be aiming to have Pacman learn how to maximize his score in the game, and the Ghosts learn the best way to prevent Pacman from winning.
What is Reinforcement Learning?
Reinforcement Learning in its simplest form can be referred to as reward based learning. For every action that our agent takes, there will be a positive or negative reward, hence the word "reinforcement." There's a lot of nuance when you're defining these reward values though, and it will take a lot of testing to reach the desired behavior for each of our agents. That's not to say that we'll be forcing certain behaviors on our agents - that defeats the purpose of letting them learn and the fun of seeing what they do by themselves - but we'll make sure that Pacman doesn't keep running into walls and that the ghosts don't just run in circles all the time.
Another cool thing that we're aiming to achieve is for each of the ghosts to have a learned personality. Let's say one of our ghosts over the course of the game keeps getting eaten by Pacman when he gets the power pellet. In theory, that ghost's learned behavior should be "avoid Pacman," and it will prioritize a different objective (maybe helping out by learning the map layout rather than specifically going for Pacman).
That Picture at the Top
So a short explanation of that picture at the top of the page now that you have some understanding of the theory behind RI. Pacman and the Ghosts are our "Agents," as mentioned before, and within those agents is going to be our deep neural network (DNN), the driving force behind our project. The DNN is going to be responsible for outputting what action our agent is going to take, given some input.
In this case, as shown in the picture, the input is going to be coming from the environment, the future state and reward being received if a certain action is taken. The state is sort of like a snapshot of the situation the agent is in. In our case it would be things like Pacman's location, if he can see ghosts, if there's pellets around him, if the Ghosts can see Pacman, etc. The reward is what happens if Pacman or a Ghost does a specific action. The DNN takes those, does it's thing (which we'll go into in another blog post, that will take quite a bit to explain!), and outputs the next action our agent should take. As it continually does this, it will learn based on the rewards it's receiving and if it's beneficial or not.