EEE 448 Reinforcement Learning and Dynamic Programming

Markov chains. Markov decision processes. Dynamic programming: policy iteration, value iteration. Modelfree reinforcement learning: Monte Carlo, Temporal Difference, Q learning. Policy gradient methods. Modelbased reinforcement learning: classical multi armed bandits, stochastic multiarmed bandits, adversarial multiarmed bandits.
Credit units: 3 ECTS Credit units: 5, Prerequisite:
(MATH 250 or MATH 255 or MATH 230) and (MATH 220 or MATH 224 or MATH 225 or MATH 241).



