Markov decision problem

from Wikipedia, the free encyclopedia

The Markov decision problem ( MEP , also Markov decision process or MDP for Markov decision process) is a model of decision problems named after the Russian mathematician Andrei Andrejewitsch Markow , in which the usefulness of an agent depends on a sequence of decisions. The Markov assumption applies to the state transitions, i. H. the probability of reaching a state from state depends only on and not on predecessors of .

Formal definition

A MEP is a tuple , where

  • a lot of states
  • a lot of actions,
  • is the action model (also transition probability ) , so that the probability is from the state and execution of the action to get into the state .
  • is the reward function that assigns a reward to every transition from the last to the current state and
  • is the start distribution that indicates for each state how likely it is to start in this state.

example

An MEP is when a robot has to navigate a maze to a destination. The set of states is the set of positions of the robot and the actions are the possible directions in which the robot can move.

solution

The solution to an MEP is a function that outputs the action for each state that maximizes profit over time. Well-known solution methods include the value iteration method and reinforcement learning .

Web links