IE 456 Reinforcement Learning and Dynamic Programming
|
Markov chains. Markov decision processes. Dynamic programming: policy iteration, value iteration. Model-free reinforcement learning: Monte Carlo, Temporal Difference, Q- learning. Policy gradient methods. Model-based reinforcement learning: classical multi- armed bandits, stochastic multi-armed bandits, adversarial multi-armed bandits.
Credit units: 3 ECTS Credit units: 5, Prerequisite:
(MATH 255 or MATH 230 or MATH 250) and (MATH 241 or MATH 225 or MATH 220 or MATH 224).
|
|
|
Bilkent University Main Page
Last regenerated automatically on December 18, 2024 by OAC - Online Academic Catalog Software
|
|