EEE 448 Reinforcement Learning and Dynamic Programming

Markov chains. Markov decision processes. Dynamic programming: policy iteration, value iteration. Modelfree reinforcement learning: Monte Carlo, Temporal Difference, Q learning. Policy gradient methods. Modelbased reinforcement learning: classical multi armed bandits, stochastic multiarmed bandits, adversarial multiarmed bandits.
Credit units: 3 ECTS Credit units: 5, Prerequisite:
(MATH 250 or MATH 255 or MATH 230) and (MATH 220 or MATH 224 or MATH 225 or MATH 241).



Bilkent University Main Page
Last regenerated automatically on October 7, 2024 by OAC  Online Academic Catalog Software

